Thread (17 messages) 17 messages, 5 authors, 2021-05-24

Re: Recovering from transaction errors [was: Re: [syzbot] INFO: rcu detected stall in tx]

From: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Date: 2021-05-21 02:12:49

Alan Stern wrote:
On Thu, May 20, 2021 at 09:23:57PM +0000, Thinh Nguyen wrote:
quoted
Alan Stern wrote:
quoted
quoted
If the cable is unplugged, then we should get a connection change event
and the driver can handle it properly.
Yes -- unless the driver is in such a tight retry loop that the rest of 
the system never gets a chance to process the connection change event.  
I've seen bug reports where that happened.
I see. I'll keep that in mind, but it sounds like HW issue? The driver
handles retry base on events generated from the HW and the HW should
properly generate connection event and not get stuck in some loop.
The hardware _does_ generate disconnect events.  The problem is that the 
class driver doesn't react properly to transaction errors and thereby 
prevents the rest of the system from handling the disconnect events.  
It's a bug in the class driver, not in the hardware.
Ok. Got it.
quoted
quoted
quoted
quoted
For the case in question (the syzbot bug report that started this 
thread), the class driver doesn't try to perform any recovery.  It just 
resubmits the URB, getting into a tight retry loop which consumes too 
much CPU time.  Simply giving up would be preferable.

Alan Stern
I see. By giving up, you mean doing port reset right? Otherwise it needs
some other mechanism to synchronize with the device side.
No, I mean the driver should just stop communicating with the device.  
That's an appropriate action for lots of drivers.  If the user wants to 
re-synchronize with the device, he can unplug the USB cable and plug it 
back in again.

Alan Stern
Ok. Would it be more difficult to automate this if it requires user
intervention? I assume syzbot doesn't want the user to do that.
Difficult to automate what, exactly?  Unplugging the USB cable?  How 
could you possibly automate that?

At the moment, I think the best approach is Guido's suggestion to reject 
URBs submitted to endpoints that have gotten a transaction error, until 
the error status has somehow been cleared.  Is that what you would like 
to see automated?
First, just want to point out that I'm not familiar with syzbot. I was
just thinking if this issue occurs, and if the user wants to start a new
test, then she doesn't have to unplug+plug the device back and allow
some application to automatically trigger a new test after a failure.

BR,
Thinh
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help