Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring

From: ShuangYu <hidden>
Date: 2026-03-22 09:43:22
Also in: kvm, lkml, virtualization

From: "Michael S. Tsirkin"<mst@redhat.com>
Date:  Mon, Mar 2, 2026, 16:52
Subject:  [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
To: <redacted>
Cc: "ShuangYu"<redacted>, "Stefano Garzarella"<sgarzare@redhat.com>, "Stefan Hajnoczi"<stefanha@redhat.com>, "Jason Wang"<jasowang@redhat.com>, "Eugenio Pérez"<eperezma@redhat.com>, <redacted>, <redacted>, <redacted>
vhost_get_avail_idx is supposed to report whether it has updated
vq->avail_idx. Instead, it returns whether all entries have been
consumed, which is usually the same. But not always - in
drivers/vhost/net.c and when mergeable buffers have been enabled, the
driver checks whether the combined entries are big enough to store an
incoming packet. If not, the driver re-enables notifications with
available entries still in the ring. The incorrect return value from
vhost_get_avail_idx propagates through vhost_enable_notify and causes
the host to livelock if the guest is not making progress, as vhost will
immediately disable notifications and retry using the available entries.

The obvious fix is to make vhost_get_avail_idx do what the comment
says it does and report whether new entries have been added.

Reported-by: ShuangYu <redacted>
Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()")
Cc: Stefano Garzarella <sgarzare@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---

Lightly tested, posting early to simplify testing for the reporter.

 drivers/vhost/vhost.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2f2c45d20883..db329a6f6145 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c

@@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)

 static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)
 {
         __virtio16 idx;
+        u16 avail_idx;
         int r;
 
         r = vhost_get_avail(vq, idx, &vq->avail->idx);

@@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)

         }
 
         /* Check it isn't doing very strange thing with available indexes */
-        vq->avail_idx = vhost16_to_cpu(vq, idx);
-        if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) {
+        avail_idx = vhost16_to_cpu(vq, idx);
+        if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) {
                 vq_err(vq, "Invalid available index change from %u to %u",
-                       vq->last_avail_idx, vq->avail_idx);
+                       vq->last_avail_idx, avail_idx);
                 return -EINVAL;
         }
 
         /* We're done if there is nothing new */
-        if (vq->avail_idx == vq->last_avail_idx)
+        if (avail_idx == vq->avail_idx)
                 return 0;
 
+        vq->avail_idx = avail_idx;
+
         /*
          * We updated vq->avail_idx so we need a memory barrier between
          * the index read above and the caller reading avail ring entries.
-- 
MST

Hi,

Sorry for the delay, it took me some time to build a reliable test environment. 

I've verified the patch on 6.18.10. With 16 concurrent TCP flows over
virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration):

  Before patch (bpftrace + CPU monitor):
    - vhost_discard_vq_desc: up to 3,716,272/3s
    - vhost_enable_notify false positives: up to 3,716,278/3s
      (nearly 1:1 with discard — every partial alloc triggers re-loop)
    - vhost worker CPU: 0~99%, frequently 50-99%
    - successful receives: as low as 137/3s

  After patch:
    - vhost_discard_vq_desc: 9~33/3s
    - vhost_enable_notify false positives: 0 (all 100 sample windows)
    - vhost worker CPU: 0% (all 149 sample points)
    - successful receives: 873~2,667/3s

The partial allocations still occur under load, but after the fix
vhost_get_avail_idx correctly returns 0, so the worker sleeps
instead of spinning. Thanks.

Tested-by: ShuangYu <redacted>

Best regards,
ShuangYu

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help