Thread (7 messages) 7 messages, 4 authors, 2021-09-25

Re: [PATCH] Urgent bug fix causing Apple SSDs to not work.

From: Keith Busch <kbusch@kernel.org>
Date: 2021-09-25 20:39:02
Also in: lkml

Possibly related (same subject, not in this thread)

On Sat, Sep 25, 2021 at 10:08:53PM +0200, Sven Peter wrote:
I actually ran into a similar issue while adding support for the NVMe
controller found on the M1 and assumed it was only present there.

Some background why this happens: ANS2 is a co-processor that emulates
an NVMe MMIO interface and uses the tag as an index to an internal data
structure. 
Thanks for confirming the behavior. The patch should restore the
command_id values to when everything was working. I'll just need to
update the quirk description to better align with the actual limitation
if the patch is successful. 
On the M1 we can directly talk to ANS2 and while we can submit
commands with a higher index it'll just ignore the upper bits and only
return the lowest eight IIRC in the completion queue.
I guess whatever software is running on the T2 actually has an assert to
ensure that the tag is within the limits before forwarding the command
to ANS2.
Is the PCI Device ID for the M1 the same as reported for the T2? Either
0x2005 or 0x2003 should make this quirk apply.

Side note, I mistakenly added an entry for device ID 0x2006, but that
was from me misreading the report.
Best,

Sven


On Sat, Sep 25, 2021, at 21:54, Keith Busch wrote:
quoted
On Sat, Sep 25, 2021 at 11:47:08AM -0700, Linus Torvalds wrote:
quoted
On Fri, Sep 24, 2021 at 9:02 PM Aditya Garg [off-list ref] wrote:
quoted
From: Aditya Garg <redacted>
Date: Fri, 24 Sep 2021 15:36:45 +0530
Subject: [PATCH] Revert nvme to 5.14.5 to fix incompatibility arised in Apple SSDs.
Fixes: e7006de6c238 (nvme: code command_id with a genctr for use-after-free validation)
I think we need to hear more about the problem than just revert a
commit like this randomly. That commit has already been picked up for
-stable,

What are the exact symptoms, and which Apple SSD is this?

I do find this:

  https://lore.kernel.org/all/cjJiSFV77WM51ciS8EuBcdeBcv9T83PUB-Kw3yi8PuC_LwrrUUnQ3w5RC1PbKvSYE72KryXp3wOJhv4Ov_WWIe2gKWOOo5uwuUjbbFA8HDM=@protonmail.com/ (local)

which instead of a revert has an actual patch. Can you try that one?

Keith Busch replied to that one, saying that the Apple SSD might not
be spec compliant, but hey, what else is new? If we start demanding
that hardware comply with specs, we'd have to scrap the whole notion
of working in the real world. Plus it would be very hypocritical of
us, since we ignore all specs when we deem them too limiting (whether
they be language specs, POSIX OS specs, or whatever).
Right, we have a lot of quirks for the apple controllers, what's one
more? :)

Could the following patch be tried? I'm basing this off the 'lspci' from
Orlando, but I'm assuming the previous model has the same limitation,
too.

---
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 7efb31b87f37..f0787233557f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -979,6 +979,7 @@ EXPORT_SYMBOL_GPL(nvme_cleanup_cmd);
 blk_status_t nvme_setup_cmd(struct nvme_ns *ns, struct request *req)
 {
 	struct nvme_command *cmd = nvme_req(req)->cmd;
+	struct nvme_ctrl *ctrl = nvme_req(req)->ctrl;
 	blk_status_t ret = BLK_STS_OK;

 	if (!(req->rq_flags & RQF_DONTPREP)) {
@@ -1027,7 +1028,8 @@ blk_status_t nvme_setup_cmd(struct nvme_ns *ns, 
struct request *req)
 		return BLK_STS_IOERR;
 	}

-	nvme_req(req)->genctr++;
+	if (!(ctrl->quirks & NVME_QUIRK_SKIP_CID_GEN))
+		nvme_req(req)->genctr++;
 	cmd->common.command_id = nvme_cid(req);
 	trace_nvme_setup_cmd(req, cmd);
 	return ret;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 9871c0c9374c..b49761d30df7 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -86,6 +86,12 @@ enum nvme_quirks {
 	 */
 	NVME_QUIRK_NO_DEEPEST_PS		= (1 << 5),

+	/*
+	 * The controller requires the command_id value be be limited to the
+	 * queue depth.
+	 */
+	NVME_QUIRK_SKIP_CID_GEN			= (1 << 6),
+
 	/*
 	 * Set MEDIUM priority on SQ creation
 	 */
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b82492cd7503..d9f22ed68185 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -3369,7 +3369,10 @@ static const struct pci_device_id nvme_id_table[] = {
 	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2005),
 		.driver_data = NVME_QUIRK_SINGLE_VECTOR |
 				NVME_QUIRK_128_BYTES_SQES |
-				NVME_QUIRK_SHARED_TAGS },
+				NVME_QUIRK_SHARED_TAGS ,
+				NVME_QUIRK_SKIP_CID_GEN },
+	{ PCI_DEVICE(PCI_VENDOR_ID_APPLE, 0x2006),
+		.driver_data = NVME_QUIRK_SKIP_CID_GEN },

 	{ PCI_DEVICE_CLASS(PCI_CLASS_STORAGE_EXPRESS, 0xffffff) },
 	{ 0, }
--
-- 
Sven Peter
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help