Thread (42 messages) 42 messages, 5 authors, 2024-02-26

Re: [PATCH net-next v4 2/2] virtio-net: add cond_resched() to the command waiting loop

From: Jason Wang <jasowang@redhat.com>
Date: 2023-07-26 01:56:38
Also in: lkml, virtualization

On Tue, Jul 25, 2023 at 3:36 PM Michael S. Tsirkin [off-list ref] wrote:
On Tue, Jul 25, 2023 at 11:07:40AM +0800, Jason Wang wrote:
quoted
On Mon, Jul 24, 2023 at 3:17 PM Michael S. Tsirkin [off-list ref] wrote:
quoted
On Mon, Jul 24, 2023 at 02:52:05PM +0800, Jason Wang wrote:
quoted
On Sat, Jul 22, 2023 at 4:18 AM Maxime Coquelin
[off-list ref] wrote:
quoted


On 7/21/23 17:10, Michael S. Tsirkin wrote:
quoted
On Fri, Jul 21, 2023 at 04:58:04PM +0200, Maxime Coquelin wrote:
quoted

On 7/21/23 16:45, Michael S. Tsirkin wrote:
quoted
On Fri, Jul 21, 2023 at 04:37:00PM +0200, Maxime Coquelin wrote:
quoted

On 7/20/23 23:02, Michael S. Tsirkin wrote:
quoted
On Thu, Jul 20, 2023 at 01:26:20PM -0700, Shannon Nelson wrote:
quoted
On 7/20/23 1:38 AM, Jason Wang wrote:
quoted
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed on a device whose CVQ might be slow.

Signed-off-by: Jason Wang <jasowang@redhat.com>
This still leaves hung processes, but at least it doesn't pin the CPU any
more.  Thanks.
Reviewed-by: Shannon Nelson <redacted>
I'd like to see a full solution
1- block until interrupt
I remember in previous versions, you worried about the extra MSI
vector. (Maybe I was wrong).
quoted
quoted
quoted
quoted
quoted
Would it make sense to also have a timeout?
And when timeout expires, set FAILED bit in device status?
virtio spec does not set any limits on the timing of vq
processing.
Indeed, but I thought the driver could decide it is too long for it.

The issue is we keep waiting with rtnl locked, it can quickly make the
system unusable.
if this is a problem we should find a way not to keep rtnl
locked indefinitely.
Any ideas on this direction? Simply dropping rtnl during the busy loop
will result in a lot of races. This seems to require non-trivial
changes in the networking core.
quoted
 From the tests I have done, I think it is. With OVS, a reconfiguration
is performed when the VDUSE device is added, and when a MLX5 device is
in the same bridge, it ends up doing an ioctl() that tries to take the
rtnl lock. In this configuration, it is not possible to kill OVS because
it is stuck trying to acquire rtnl lock for mlx5 that is held by virtio-
net.
Yeah, basically, any RTNL users would be blocked forever.

And the infinite loop has other side effects like it blocks the freezer to work.

To summarize, there are three issues

1) busy polling
2) breaks freezer
3) hold RTNL during the loop

Solving 3 may help somehow for 2 e.g some pm routine e.g wireguard or
even virtnet_restore() itself may try to hold the lock.
Yep. So my feeling currently is, the only real fix is to actually
queue up the work in software.
Do you mean something like:

rtnl_lock();
queue up the work
rtnl_unlock();
return success;

?
yes
We will lose the error reporting, is it a real problem or not?
quoted
quoted
It's mostly trivial to limit
memory consumption, vid's is the
only one where it would make sense to have more than
1 command of a given type outstanding.
And rx mode so this implies we will fail any command if the previous
work is not finished.
don't fail it, store it.
Ok.

Thanks
quoted
quoted
have a tree
or a bitmap with vids to add/remove?
Probably.

Thanks
quoted

quoted
quoted
quoted
quoted
quoted
quoted
quoted
2- still handle surprise removal correctly by waking in that case
This is basically what version 1 did?

https://lore.kernel.org/lkml/6026e801-6fda-fee9-a69b-d06a80368621@redhat.com/t/ (local)

Thanks
Yes - except the timeout part.

quoted
quoted
quoted
quoted
quoted
quoted
quoted

quoted
quoted
---
     drivers/net/virtio_net.c | 4 +++-
     1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f3b1d6ac33d..e7533f29b219 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2314,8 +2314,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
             * into the hypervisor, so the request should be handled immediately.
             */
            while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-              !virtqueue_is_broken(vi->cvq))
+              !virtqueue_is_broken(vi->cvq)) {
+               cond_resched();
                    cpu_relax();
+       }

            return vi->ctrl->status == VIRTIO_NET_OK;
     }
--
2.39.3

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help