Thread (6 messages) 6 messages, 2 authors, 2021-09-26

Re: [REGRESSION] nvme: code command_id with a genctr for use-after-free validation crashes apple T2 SSD

From: Keith Busch <kbusch@kernel.org>
Date: 2021-09-25 17:16:51
Also in: regressions

On Sat, Sep 25, 2021 at 01:10:42PM +0000, Orlando Chamberlain wrote:
Commit e7006de6c238 causes the SSD controller on Apple T2 computers to crash
and prevents linux from booting.

This commit implemented a counter that is stored within the NVMe command_id,
however this counter makes the command_id higher than normal, causing a panic
on the T2 security chip that functions as the SSD controller, which then
causes the system to power off after a few seconds.
Ah, yet another spec non-complainat quirk from these controllers.
This was reported on bugzilla here:
https://bugzilla.kernel.org/show_bug.cgi?id=214509 but it was not originally
classified as NVMe (when the report was created it was unknown what was
causing it), so I don't know if it notified the NVMe mailing list when it
was later reclassified to NVMe. Sorry if you've already seen this issue.
The mailing list was not copied, so thank you for directly notifying
this list. 
 
The T2 security chip (which is the SSD) has this line in its crash log (the
rest of this log is in an attachment on the bugzilla report):

panic(cpu 1 caller 0xfffffff028d884ec): ANS2 Recoverable Panic - assert failed: [7447]:command id out of range error (cid = 4120), status_reg: 0x2000 - Null(2)

This is the entry in lspci -nn for the ssd:

04:00.0 Mass storage controller [0180]: Apple Inc. ANS2 NVMe Controller [106b:2005] (rev 01)

This commit was included in 5.14.6 and backported to 5.10.67, but does not
occur in 5.14.5 and 5.10.66. I am on a MacBookPro16,1, the crash has been
reproduced on a MacBookPro16,2 as well. 
Is the PCI VID:DID the same from in your lspci output for all affected
macbooks?
I have been able to reproduce on Arch
Linux with vanilla kernel 5.10.67 (others have gotten it on 5.14.6) with no
DKMS modules, and I bisected it to that commit
(e7006de6c23803799be000a5dcce4d916a36541a).

I've tried to modify the genctr so that it is in the other side of the
command_id (which I thought might make the command_id's lower) with the patch
below, but it did not prevent the crash.
That might mean the h/w is using the command id as an index into
internal structures. That is not spec compliant, so it sounds like
we'll need to introduce another quirk for the macs.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help