Thread (7 messages) 7 messages, 3 authors, 2021-09-01

Re: [PATCH v4] libata: Add ATA_HORKAGE_NO_NCQ_ON_AMD for Samsung 860 and 870 SSD.

From: Hans de Goede <hidden>
Date: 2021-09-01 08:55:47
Also in: lkml
Subsystem: libata subsystem (serial and parallel ata drivers), the rest · Maintainers: Damien Le Moal, Niklas Cassel, Linus Torvalds

Hi Tor,

On 9/1/21 9:37 AM, torvic9@mailbox.org wrote:
(Sorry for not doing a proper reply)

Hello,
Noob here.
I have a Samsung 860 Pro connected to a AMD X570 chipset mainboard and
it just works flawlessly on 5.13 and 5.14.
Are you sure that *every* 860/870 is concerned by this problem on
*every* AMD controller?
I am pretty sure that every 860 / 870 EVO is affected,
I am not sure if the PRO is also affected.

As for *every* AMD controller, chances are that more recent
AMD controllers are fine.

We have been trying to resolve various issues with this combo
for a long time now, see:

https://bugzilla.kernel.org/show_bug.cgi?id=201693
https://bugzilla.kernel.org/show_bug.cgi?id=203475
Isn't this too restrictive?
Or am I simply missing something?
The problem is that when users are hit by this they end up with
a non functional system and even fs / data  corruption. Where
as OTOH disabling NCQ leads to a (significant) performance
degradation but affected systems will still work fine.

So I believe that it is best to err on the safe side here
and accept the performance degradation as a trade-of for
fixing the fs / data corruption.

With that said, I do believe that we should allow re-enabling
ncq on this combo through libata.force on the kernel cmdline
by adding this extra bit to the patch:
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6136,6 +6136,8 @@ static int __init ata_parse_force_one(char **cur,
 		{ "ncq",	.horkage_off	= ATA_HORKAGE_NONCQ },
 		{ "noncqtrim",	.horkage_on	= ATA_HORKAGE_NO_NCQ_TRIM },
 		{ "ncqtrim",	.horkage_off	= ATA_HORKAGE_NO_NCQ_TRIM },
+		{ "noncqamd",	.horkage_on	= ATA_HORKAGE_NO_NCQ_ON_AMD },
+		{ "ncqamd",	.horkage_off	= ATA_HORKAGE_NO_NCQ_ON_AMD },
 		{ "dump_id",	.horkage_on	= ATA_HORKAGE_DUMP_ID },
 		{ "pio0",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 0) },
 		{ "pio1",	.xfer_mask	= 1 << (ATA_SHIFT_PIO + 1) },
And I will also add a comment to both linked bugs to see if we can maybe
exclude the pro models from this quirk and if we can maybe narrow it down
to a subset of the AMD SATA controllers.

But that narrowing down is probably best done as a follow up fix, while just
going with this "err on the safe side" approach for now.

Regards,

Hans
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help