On Sun, 22 Apr 2012, Ming Lei wrote:
On Sun, Apr 22, 2012 at 8:50 PM, Alan Stern [off-list ref] wrote:
quoted
On Sun, 22 Apr 2012, Ming Lei wrote:
quoted
quoted
Although the kerneldoc doesn't actually say so, it should be safe to
assume that usb_unlink_urb calls the completion routine directly _only_
in cases where the unlink succeeded. �(We could add this to the
kerneldoc.)
Therefore: If the URB completes with status other than -ECONNRESET then
you can safely take the lock for resubmission. �If the URB completes
with status == -ECONNRESET then you know it was unlinked, so you don't
need to take the lock -- the race has already been lost.
Does that solve your problem?
Not sure if that does work.
If the URB completes asynchronously after unlinking, its status is still
�-ECONNRESET, so extra race may be caused without holding the lock
because complete handler will access some global data.
That would be a completely separate race, right? �So maybe it can use a
Not sure, at least in both usbnet and usbhid cases, the lock is held before
usb_unlink_urb, and the same lock is to be acquired in the URB complete
handler.
quoted
different lock for protection -- and this other lock could be dropped
before usb_unlink_urb is called.
If the lock which is to be acquired in the URB complete handler is dropped
before calling usb_unlink_urb, one new submitted URB in complete handler
may be unlinked, as mentioned by Oliver already.
We are now talking about two locks. One of them is held during the
call to usb_unlink_urb; the completion handler does not acquire that
lock if the URB's status is -ECONNRESET. The other lock is dropped
before usb_unlink_urb is called, so the completion handler can safely
grab it.
On Mon, 23 Apr 2012, Oliver Neukum wrote:
quoted
If the URB completes asynchronously after unlinking, its status is still
-ECONNRESET, so extra race may be caused without holding the lock
because complete handler will access some global data.
That is the race. And you need not invoke global data. The original
race opens again if you are submitting a new URB without the lock
held.
This is because we cannot be sure that the same URB is unlinked
only once. A subsequent timeout may kill the wrong URB if the
first is unlinked so that the callback really comes in interrupt.
But the basic idea is brilliant. It's just that the one way logical implication:
recursive direct call of the callback -> status == -ECONNRESET
is not strong enough. But that is very easy to fix. As we know whether
the callback is directly called or not, all we need to do is differentiate
the cases in urb->status, by introducing a new error code.
I don't like the idea of changing the status codes. It would mean
changing usb_kill_urb too.
Instead of changing return codes or adding locks, how about
implementing a small state machine for each URB?
Initially the state is ACTIVE.
When the URB times out, acquire the lock. If the state is not
equal to ACTIVE, drop the lock and return immediately (the URB
is being unlinked concurrently). Otherwise set the state to
UNLINK_STARTED, drop the lock, call usb_unlink_urb, and
reacquire the lock. If the state hasn't changed, set it back
to ACTIVE. But if the state has changed to UNLINK_FINISHED,
set it to ACTIVE and resubmit.
In the completion handler, grab the lock. If the state
is ACTIVE, resubmit. But if the state is UNLINK_STARTED,
change it to UNLINK_FINISHED and don't resubmit.
This is a better approach, in that it doesn't make any assumptions
regarding synchronous vs. asynchronous unlinks. If you want, you could
have two different ACTIVE substates, one for URBs which haven't yet
been unlinked and one for URBs which have been. Then you could avoid
unlinking the same URB twice.
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html