Re: [PATCH] xhci: guard accesses to ep_state in xhci_endpoint_reset()
From: Phil Elwell <hidden>
Date: 2021-09-01 13:15:30
Also in:
lkml
Hi Mathias, On 01/09/2021 10:21, Mathias Nyman wrote:
On 31.8.2021 19.02, Phil Elwell wrote:quoted
From: Jonathan Bell <redacted> See https://github.com/raspberrypi/linux/issues/3981Thanks, so in a nutshell the issue looks something like: [827586.220071] xhci_hcd 0000:01:00.0: WARN Cannot submit Set TR Deq Ptr [827586.220087] xhci_hcd 0000:01:00.0: A Set TR Deq Ptr command is pending. [827723.160680] INFO: task usb-storage:93 blocked for more than 122 seconds. The blocked task is probably because xhci driver failed to give back the URB after failing to submit a "Set TR Deq Ptr" command. This part should be fixed in: https://lore.kernel.org/r/20210820123503.2605901-4-mathias.nyman@linux.intel.com (local) which is currently in usb-next, and should be in 5.15-rc1 and future 5.12+ stable.quoted
Two read-modify-write cycles on ep->ep_state are not guarded by xhci->lock. Fix these.This is probably one cause for the "Warn Cannot submit Set TR Deq Ptr A Set TR Deq Ptr command is pending" message. Another possibility is that with UAS and streams we have several transfer rings per endpoint, meaning that if two TDs on separate stream rings on the same endpoint both stall, or are cancelled we could see this message. The SET_DEQ_PENDING flag in ep->ep_state should probably be per ring, not per endpoint. Then we also need a "rings_with_pending_set_deq" counter per endpoint to keep track when all set_tr_deq commands complete, and we can restart the endpoint
Jonathan, the author of the patch, may give some detailed feedback on these statements when he has a moment - "Well, sort of... it's complicated" was the summary.
Anyway, my patch linked above together with this patch should make these errors a lot more harmless.
Yes, I think that's true. We have a downstream patch to warn about a pending Set TR Deq Ptr command but proceed anyway, allowing systems to recover, but with the additional spin lock usage our users are reporting no failures _and_ no warnings.
Let me know if you can trigger the issue with both these patches applied.
We've not tried your patch yet.
I'll add your patch to the queue as well.
Many thanks, Phil