Re: [PATCH RFC] virtio_net: fix refill related races
From: Rusty Russell <hidden>
Date: 2011-12-22 04:35:49
Also in:
lkml, virtualization
Subsystem:
networking drivers, the rest, virtio net driver · Maintainers:
Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds, "Michael S. Tsirkin", Jason Wang
On Wed, 21 Dec 2011 11:06:37 +0200, "Michael S. Tsirkin" [off-list ref] wrote:
On Wed, Dec 21, 2011 at 10:13:18AM +1030, Rusty Russell wrote:quoted
On Tue, 20 Dec 2011 21:45:19 +0200, "Michael S. Tsirkin" [off-list ref] wrote:quoted
On Tue, Dec 20, 2011 at 11:31:54AM -0800, Tejun Heo wrote:quoted
On Tue, Dec 20, 2011 at 09:30:55PM +0200, Michael S. Tsirkin wrote:quoted
Hmm, in that case it looks like a nasty race could get triggered, with try_fill_recv run on multiple CPUs in parallel, corrupting the linked list within the vq. Using the mutex as my patch did will fix that naturally, as well.Don't know the code but just use nrt wq. There's even a system one called system_nrq_wq. Thanks.We can, but we need the mutex for other reasons, anyway.Well, here's the alternate approach. What do you think?It looks very clean, thanks. Some documentation suggestions below. Also - Cc stable on this and the block patch?
AFAICT we haven't seen this bug, and theoretical bugs don't get into -stable.
quoted
Finding two wq issues makes you justifiably cautious, but it almost feels like giving up to simply wrap it in a lock. The APIs are designed to let us do it without a lock; I was just using them wrong.One thing I note is that this scheme works because there's a single entity disabling/enabling napi and the refill thread. So it's possible that Amit will need to add a lock and track NAPI state anyway to make suspend work. But we'll see.
Fixed typo, documented the locking, queued for -next. Thanks! Rusty. From: Rusty Russell <redacted> Subject: virtio_net: set/cancel work on ndo_open/ndo_stop Michael S. Tsirkin noticed that we could run the refill work after ndo_close, which can re-enable napi - we don't disable it until virtnet_remove. This is clearly wrong, so move the workqueue control to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close). One subtle point: virtnet_probe() could simply fail if it couldn't allocate a receive buffer, but that's less polite in virtnet_open() so we schedule a refill as we do in the normal receive path if we run out of memory. Signed-off-by: Rusty Russell <redacted> --- drivers/net/virtio_net.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c@@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct return err; } -/* Returns false if we couldn't fill entirely (OOM). */ +/* + * Returns false if we couldn't fill entirely (OOM). + * + * Normally run in the receive path, but can also be run from ndo_open + * before we're receiving packets, or from refill_work which is + * careful to disable receiving (using napi_disable). + */ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp) { int err;
@@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure we have some buffers: if oom use wq. */ + if (!try_fill_recv(vi, GFP_KERNEL)) + schedule_delayed_work(&vi->refill, 0); + virtnet_napi_enable(vi); return 0; }
@@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure refill_work doesn't re-enable napi! */ + cancel_delayed_work_sync(&vi->refill); napi_disable(&vi->napi); return 0;
@@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d unregister: unregister_netdev(dev); - cancel_delayed_work_sync(&vi->refill); free_vqs: vdev->config->del_vqs(vdev); free_stats:
@@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str /* Stop all the virtqueues. */ vdev->config->reset(vdev); - unregister_netdev(vi->dev); - cancel_delayed_work_sync(&vi->refill); /* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi);