Thread (48 messages) 48 messages, 6 authors, 2026-04-10

Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2026-03-23 23:58:01
Also in: linux-acpi, linux-iommu, linux-pci, lkml

On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
If the software times out first at 1s, it means the CMDQ is still
pending on wait for the completion of ATC invalidation. Then, the
caller sees -ETIMEOUT and tries to bisect the ATC batch or update
the STE directly, either of which involves CMDQ. But CMDQ has not
recovered yet.
Yeah, I don't know if the SW timeout flow is really all that RASy here
right now. Without somehow recovering the CMDQ it is pointless to try
to continue after a timeout.

And we are really in trouble if things like normal IOTLB invalidation
start to fail.

I think the right thing is to somehow try to recover the cmdq and then
restart it on the commands that haven't been SYNC'd yet and just keep
trying, maybe with progressively longer timeouts.

Just ignoring the error and continuing doesn't seem safe.

But that's something else again, as long as ATC invalidation reliably
hits the HW timeout first we should be OK to ignore it in this
series..

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help