Re: Proper suspend/resume flow
From: Ben Hutchings <hidden>
Date: 2014-03-29 16:37:38
On Wed, 2014-03-26 at 11:58 +0000, Russell King - ARM Linux wrote:
I've been looking at the Freescale FEC driver recently, and I've noticed
that it's suspend path looks like this:
if (netif_running(ndev)) {
netif_device_detach(ndev);
fec_stop(ndev);
}
I believe this to be unsafe, because if the device is active, then we may
have packets in the transmit ring. It may be possible for the ring to be
cleaned between netif_device_detach() and fec_stop(), resulting in a call
to netif_wake_queue(), which doesn't seem to take account of whether the
device has been detached.
I wondered if this was just limited to the freescale driver, and I found
a similar pattern in e1000e - __e1000e_shutdown() calls netif_device_detach()
as the very first thing, before doing anything else. This seems to be a
common pattern. 8139cp.c does this, the second call seems to be rather
redundant:
netif_device_detach (dev);
netif_stop_queue (dev);
Tulip on the other hand shuts the hardware down and cleans the transmit
ring first before calling netif_device_detach(), thus ensuring that there
won't be any transmit completions before this call. It doesn't stop the
queue before this, so presumably in a SMP environment, it is possible for
packets to get queued after the transmit ring has been cleaned.
So, what is the correct approach to suspending a net device? When should
netif_device_detach() be called?[...] My advice here is based on Solarflare's extensive stress-testing of the sfc driver control path and the race conditions it uncovered. It did not cover suspend/resume, but it does cover MTU and ring size changes which involve the same sort of stop/start while the device is still marked as running. - Given the way the watchdog works, I think netif_device_detach() must be called first if you are going to stop TX queues in a control operation other than ndo_stop (commits e4abce853849, 29c69a488264). - The driver must check netif_device_present() before calling netif_wake_queue(). Perhaps netif_wake_queue() itself should check that, if many drivers need to do it. - Calls to netif_device_detach() may need to be synchronised with the TX scheduler (commits c2f3b8e3a44b, 35205b211c8d). To support this, the function efx_device_detach_sync() maybe should be turned into a general netif_device_detach_sync() (and then possibly needs to disable IRQs). Ben. -- Ben Hutchings [W]e found...that it wasn't as easy to get programs right as we had thought. ... I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs. - Maurice Wilkes, 1949
Attachments
- signature.asc [application/pgp-signature] 811 bytes