Re: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command
From: Jason Wang <jasowang@redhat.com>
Date: 2022-12-26 03:46:08
Also in:
lkml, virtualization
On Fri, Dec 23, 2022 at 4:05 PM Eugenio Perez Martin [off-list ref] wrote:
On Fri, Dec 23, 2022 at 4:04 AM Jason Wang [off-list ref] wrote:quoted
On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin [off-list ref] wrote:quoted
On Thu, Dec 22, 2022 at 7:05 AM Jason Wang [off-list ref] wrote:quoted
We used to busy waiting on the cvq command this tends to be problematic since: 1) CPU could wait for ever on a buggy/malicous device 2) There's no wait to terminate the process that triggers the cvq command So this patch switch to use sleep with a timeout (1s) instead of busy polling for the cvq command forever. This gives the scheduler a breath and can let the process can respond to a signal. Signed-off-by: Jason Wang <jasowang@redhat.com> --- drivers/net/virtio_net.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8225496ccb1e..69173049371f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c@@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) vi->rx_mode_work_enabled = false; spin_unlock_bh(&vi->rx_mode_lock); + virtqueue_wake_up(vi->cvq); flush_work(&vi->rx_mode_work); }@@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, return !oom; } +static void virtnet_cvq_done(struct virtqueue *cvq) +{ + virtqueue_wake_up(cvq); +} + static void skb_recv_done(struct virtqueue *rvq) { struct virtnet_info *vi = rvq->vdev->priv;@@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, if (unlikely(!virtqueue_kick(vi->cvq))) return vi->ctrl->status == VIRTIO_NET_OK; - /* Spin for a response, the kick causes an ioport write, trapping - * into the hypervisor, so the request should be handled immediately. - */ - while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) - cpu_relax(); + virtqueue_wait_for_used(vi->cvq, &tmp); return vi->ctrl->status == VIRTIO_NET_OK; }@@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) /* Parameters for control virtqueue, if any */ if (vi->has_cvq) { - callbacks[total_vqs - 1] = NULL; + callbacks[total_vqs - 1] = virtnet_cvq_done;If we're using CVQ callback, what is the actual use of the timeout?Because we can't sleep forever since locks could be held like RTNL_LOCK.Right, rtnl_lock kind of invalidates it for a general case. But do all of the commands need to take rtnl_lock? For example I see how we could remove it from ctrl_announce,
I think not, it's intended to serialize all cvq commands.
so lack of ack may not be fatal for it.
Then there could be more than one cvq commands sent to the device, the busy poll logic may not work. And it's a hint that the device malfunctioned which is something that the driver should be aware of. Thanks
Assuming a buggy device, we can take some cvq commands out of this fatal situation. This series already improves the current situation and my suggestion (if it's worth it) can be applied on top of it, so it is not a blocker at all.quoted
quoted
I'd say there is no right choice neither in the right timeout value nor in the action to take.In the next version, I tend to put BAD_RING() to prevent future requests.quoted
Why not simply trigger the cmd and do all the changes at command return?I don't get this, sorry.It's actually expanding the first point so you already answered it :). Thanks!quoted
quoted
I suspect the reason is that it complicates the code. For example, having the possibility of many in flight commands, races between their completion, etc.Actually the cvq command was serialized through RTNL_LOCK, so we don't need to worry about this. In the next version I can add ASSERT_RTNL(). Thanksquoted
The virtio standard does not even cover unordered used commands if I'm not wrong. Is there any other fundamental reason? Thanks!quoted
names[total_vqs - 1] = "control"; } -- 2.25.1