Re: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring
From: ShuangYu <hidden>
Date: 2026-03-22 09:43:22
Also in:
kvm, lkml, virtualization
quoted hunk ↗ jump to hunk
From: "Michael S. Tsirkin"<mst@redhat.com> Date: Mon, Mar 2, 2026, 16:52 Subject: [PATCH RFC] vhost: fix vhost_get_avail_idx for a non empty ring To: <redacted> Cc: "ShuangYu"<redacted>, "Stefano Garzarella"<sgarzare@redhat.com>, "Stefan Hajnoczi"<stefanha@redhat.com>, "Jason Wang"<jasowang@redhat.com>, "Eugenio Pérez"<eperezma@redhat.com>, <redacted>, <redacted>, <redacted> vhost_get_avail_idx is supposed to report whether it has updated vq->avail_idx. Instead, it returns whether all entries have been consumed, which is usually the same. But not always - in drivers/vhost/net.c and when mergeable buffers have been enabled, the driver checks whether the combined entries are big enough to store an incoming packet. If not, the driver re-enables notifications with available entries still in the ring. The incorrect return value from vhost_get_avail_idx propagates through vhost_enable_notify and causes the host to livelock if the guest is not making progress, as vhost will immediately disable notifications and retry using the available entries. The obvious fix is to make vhost_get_avail_idx do what the comment says it does and report whether new entries have been added. Reported-by: ShuangYu <redacted> Fixes: d3bb267bbdcb ("vhost: cache avail index in vhost_enable_notify()") Cc: Stefano Garzarella <sgarzare@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> --- Lightly tested, posting early to simplify testing for the reporter. drivers/vhost/vhost.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 2f2c45d20883..db329a6f6145 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c@@ -1522,6 +1522,7 @@ static void vhost_dev_unlock_vqs(struct vhost_dev *d)static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq) { __virtio16 idx; + u16 avail_idx; int r; r = vhost_get_avail(vq, idx, &vq->avail->idx);@@ -1532,17 +1533,19 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq)} /* Check it isn't doing very strange thing with available indexes */ - vq->avail_idx = vhost16_to_cpu(vq, idx); - if (unlikely((u16)(vq->avail_idx - vq->last_avail_idx) > vq->num)) { + avail_idx = vhost16_to_cpu(vq, idx); + if (unlikely((u16)(avail_idx - vq->last_avail_idx) > vq->num)) { vq_err(vq, "Invalid available index change from %u to %u", - vq->last_avail_idx, vq->avail_idx); + vq->last_avail_idx, avail_idx); return -EINVAL; } /* We're done if there is nothing new */ - if (vq->avail_idx == vq->last_avail_idx) + if (avail_idx == vq->avail_idx) return 0; + vq->avail_idx = avail_idx; + /* * We updated vq->avail_idx so we need a memory barrier between * the index read above and the caller reading avail ring entries. -- MST
Hi, Sorry for the delay, it took me some time to build a reliable test environment. I've verified the patch on 6.18.10. With 16 concurrent TCP flows over virtio-net (TSO enabled, vCPU throttled to 1%, 300s duration): Before patch (bpftrace + CPU monitor): - vhost_discard_vq_desc: up to 3,716,272/3s - vhost_enable_notify false positives: up to 3,716,278/3s (nearly 1:1 with discard — every partial alloc triggers re-loop) - vhost worker CPU: 0~99%, frequently 50-99% - successful receives: as low as 137/3s After patch: - vhost_discard_vq_desc: 9~33/3s - vhost_enable_notify false positives: 0 (all 100 sample windows) - vhost worker CPU: 0% (all 149 sample points) - successful receives: 873~2,667/3s The partial allocations still occur under load, but after the fix vhost_get_avail_idx correctly returns 0, so the worker sleeps instead of spinning. Thanks. Tested-by: ShuangYu <redacted> Best regards, ShuangYu