Re: [PATCH] virtio-net: fix race between ndo_open() and virtio_device_ready()
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2022-06-17 12:33:23
Also in:
lkml, virtualization
On Fri, Jun 17, 2022 at 07:46:23PM +0800, Jason Wang wrote:
On Fri, Jun 17, 2022 at 6:13 PM Michael S. Tsirkin [off-list ref] wrote:quoted
On Fri, Jun 17, 2022 at 03:29:49PM +0800, Jason Wang wrote:quoted
We used to call virtio_device_ready() after netdev registration. This cause a race between ndo_open() and virtio_device_ready(): if ndo_open() is called before virtio_device_ready(), the driver may start to use the device before DRIVER_OK which violates the spec. Fixing this by switching to use register_netdevice() and protect the virtio_device_ready() with rtnl_lock() to make sure ndo_open() can only be called after virtio_device_ready(). Fixes: 4baf1e33d0842 ("virtio_net: enable VQs early") Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/virtio_net.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index db05b5e930be..8a5810bcb839 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c@@ -3655,14 +3655,20 @@ static int virtnet_probe(struct virtio_device *vdev) if (vi->has_rss || vi->has_rss_hash_report) virtnet_init_default_rss(vi); - err = register_netdev(dev); + /* serialize netdev register + virtio_device_ready() with ndo_open() */ + rtnl_lock(); + + err = register_netdevice(dev); if (err) { pr_debug("virtio_net: registering device failed\n"); + rtnl_unlock(); goto free_failover; } virtio_device_ready(vdev); + rtnl_unlock(); + err = virtnet_cpu_notif_add(vi); if (err) { pr_debug("virtio_net: registering cpu notifier failed\n");Looks good but then don't we have the same issue when removing the device? Actually I looked at virtnet_remove and I see unregister_netdev(vi->dev); net_failover_destroy(vi->failover); remove_vq_common(vi); <- this will reset the device a window here?Probably. For safety, we probably need to reset before unregistering.
careful not to create new races, let's analyse this one to be sure first.
quoted
Really, I think what we had originally was a better idea - instead of dropping interrupts they were delayed and when driver is ready to accept them it just enables them.The problem is that it works only on some specific setup: - doesn't work on shared IRQ - doesn't work on some specific driver e.g virtio-blk
can some core irq work fix that?
quoted
We just need to make sure driver does not wait for interrupts before enabling them. And I suspect we need to make this opt-in on a per driver basis.Exactly. Thanksquoted
quoted
-- 2.25.1