Thread (3 messages) 3 messages, 3 authors, 2015-09-01

Re: [PATCH 2/2] usbnet: Fix a race between usbnet_stop() and the BH

From: Eugene Shatokhin <hidden>
Date: 2015-08-28 08:10:07
Also in: lkml

Possibly related (same subject, not in this thread)

25.08.2015 00:01, Bjørn Mork пишет:
Eugene Shatokhin [off-list ref] writes:
quoted
The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.

Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.

The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().

Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:

*0 __skb_unlink (skbuff.h:1517)
     prev->next = next;
*1 defer_bh (usbnet.c:430)
     spin_lock_irqsave(&list->lock, flags);
     old_state = entry->state;
     entry->state = state;
     __skb_unlink(skb, list);
     spin_unlock(&list->lock);
     spin_lock(&dev->done.lock);
     __skb_queue_tail(&dev->done, skb);
     if (dev->done.qlen == 1)
         tasklet_schedule(&dev->bh);
     spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
     state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:

*0  usbnet_terminate_urbs (usbnet.c:765)
     /* maybe wait for deletions to finish. */
     while (!skb_queue_empty(&dev->rxq)
         && !skb_queue_empty(&dev->txq)
         && !skb_queue_empty(&dev->done)) {
             schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
             set_current_state(TASK_UNINTERRUPTIBLE);
             netif_dbg(dev, ifdown, dev->net,
                   "waited for %d urb completions\n", temp);
     }
*1  usbnet_stop (usbnet.c:806)
     if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
         usbnet_terminate_urbs(dev);

As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.
Exactly what problem will that result in?  The tasklet_kill() will wait
for the processing of the single element done queue, and everything will
be fine.  Or?
Given enough time, what prevents defer_bh() from calling 
tasklet_schedule(&dev->bh) *after* usbnet_stop() calls tasklet_kill()?

Consider the following situation (assuming '&&' are changed to '||' in 
that while loop in usbnet_terminate_urbs() as they should be):

CPU0                            CPU1
usbnet_stop()                   defer_bh() with list == dev->rxq
   usbnet_terminate_urbs()
                                 __skb_unlink() removes the last
                                 skb from dev->rxq.
                                 dev->rxq, dev->txq and dev->done
                                 are now empty.
   while (!skb_queue_empty()...)
     The loop ends because all 3
     queues are now empty.

   usbnet_terminate_urbs() ends.

usbnet_stop() continues:
   usbnet_status_stop(dev);
   ...
   del_timer_sync (&dev->delay);
   tasklet_kill (&dev->bh);
                                 __skb_queue_tail(&dev->done, skb);
                                 if (dev->done.qlen == 1)
                                   tasklet_schedule(&dev->bh);

The BH is scheduled at this point, which is not what was intended. The 
race window is small, but still.

Regards,
Eugene
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help