Re: [PATCH] vsock/virtio: cap TX credit to local buffer size
From: Stefano Garzarella <sgarzare@redhat.com>
Date: 2025-12-11 09:10:57
Also in:
kvm, lkml, virtualization
On Wed, Dec 10, 2025 at 04:00:19PM +0100, Melbin K Mathew wrote:
The virtio vsock transport currently derives its TX credit directly from peer_buf_alloc, which is set from the remote endpoint's SO_VM_SOCKETS_BUFFER_SIZE value.
Why removing the target tree [net] from the tags? Also this is a v2, so the tags should have been [PATCH net v2], please check it in next versions, more info: https://www.kernel.org/doc/html/latest/process/submitting-patches.html#subject-line
On the host side this means that the amount of data we are willing to queue for a connection is scaled by a guest-chosen buffer size, rather than the host's own vsock configuration. A malicious guest can advertise a large buffer and read slowly, causing the host to allocate a correspondingly large amount of sk_buff memory. Introduce a small helper, virtio_transport_peer_buf_alloc(), that returns min(peer_buf_alloc, buf_alloc), and use it wherever we consume peer_buf_alloc: - virtio_transport_get_credit() - virtio_transport_has_space() - virtio_transport_seqpacket_enqueue() This ensures the effective TX window is bounded by both the peer's advertised buffer and our own buf_alloc (already clamped to buffer_max_size via SO_VM_SOCKETS_BUFFER_MAX_SIZE), so a remote guest cannot force the host to queue more data than allowed by the host's own vsock settings. On an unpatched Ubuntu 22.04 host (~64 GiB RAM), running a PoC with 32 guest vsock connections advertising 2 GiB each and reading slowly drove Slab/SUnreclaim from ~0.5 GiB to ~57 GiB and the system only recovered after killing the QEMU process. With this patch applied, rerunning the same PoC yields: Before: MemFree: ~61.6 GiB MemAvailable: ~62.3 GiB Slab: ~142 MiB SUnreclaim: ~117 MiB After 32 high-credit connections: MemFree: ~61.5 GiB MemAvailable: ~62.3 GiB Slab: ~178 MiB SUnreclaim: ~152 MiB i.e. only ~35 MiB increase in Slab/SUnreclaim, no host OOM, and the guest remains responsive.
I think we should include here a summary of what you replied to Michael about other transports. I can't find your reply in the archive, but I mean the reply to https://lore.kernel.org/netdev/20251210084318-mutt-send-email-mst@kernel.org/ (local)
quoted hunk ↗ jump to hunk
Fixes: 06a8fc78367d ("VSOCK: Introduce virtio_vsock_common.ko") Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Melbin K Mathew <redacted> --- net/vmw_vsock/virtio_transport_common.c | 27 ++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-)diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index dcc8a1d58..02eeb96dd 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c@@ -491,6 +491,25 @@ void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume)} EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); +/* + * Return the effective peer buffer size for TX credit computation.
nit: block comment in this file doesn't leave empty line, so I'd follow it:
@@ -491,8 +491,7 @@ void virtio_transport_consume_skb_sent(struct sk_buff *skb, bool consume) } EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); -/* - * Return the effective peer buffer size for TX credit computation. +/* Return the effective peer buffer size for TX credit computation. * * The peer advertises its receive buffer via peer_buf_alloc, but we * cap that to our local buf_alloc (derived from
+ *
+ * The peer advertises its receive buffer via peer_buf_alloc, but we
+ * cap that to our local buf_alloc (derived from
+ * SO_VM_SOCKETS_BUFFER_SIZE and already clamped to buffer_max_size)
+ * so that a remote endpoint cannot force us to queue more data than
+ * our own configuration allows.
+ */
+static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs)
+{
+ u32 peer = vvs->peer_buf_alloc;
+ u32 local = vvs->buf_alloc;
+
+ if (peer > local)
+ return local;
+ return peer;
+}
+I think here Michael was suggesting this:
@@ -502,12 +502,7 @@ EXPORT_SYMBOL_GPL(virtio_transport_consume_skb_sent); */ static u32 virtio_transport_tx_buf_alloc(struct virtio_vsock_sock *vvs) { - u32 peer = vvs->peer_buf_alloc; - u32 local = vvs->buf_alloc; - - if (peer > local) - return local; - return peer; + return min(vvs->peer_buf_alloc, vvs->buf_alloc); }
quoted hunk ↗ jump to hunk
u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit) { u32 ret;@@ -499,7 +518,8 @@ u32 virtio_transport_get_credit(struct virtio_vsock_sock *vvs, u32 credit)return 0; spin_lock_bh(&vvs->tx_lock); - ret = vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); + ret = virtio_transport_tx_buf_alloc(vvs) - + (vvs->tx_cnt - vvs->peer_fwd_cnt); if (ret > credit) ret = credit; vvs->tx_cnt += ret;@@ -831,7 +851,7 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk,spin_lock_bh(&vvs->tx_lock); - if (len > vvs->peer_buf_alloc) { + if (len > virtio_transport_tx_buf_alloc(vvs)) { spin_unlock_bh(&vvs->tx_lock); return -EMSGSIZE; }@@ -882,7 +902,8 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk)struct virtio_vsock_sock *vvs = vsk->trans; s64 bytes; - bytes = (s64)vvs->peer_buf_alloc - (vvs->tx_cnt - vvs->peer_fwd_cnt); + bytes = (s64)virtio_transport_tx_buf_alloc(vvs) - + (vvs->tx_cnt - vvs->peer_fwd_cnt);
nit: please align this:
@@ -903,7 +898,7 @@ static s64 virtio_transport_has_space(struct vsock_sock *vsk) s64 bytes; bytes = (s64)virtio_transport_tx_buf_alloc(vvs) - - (vvs->tx_cnt - vvs->peer_fwd_cnt); + (vvs->tx_cnt - vvs->peer_fwd_cnt); if (bytes < 0) bytes = 0;
Just minor things, but the patch LGTM, thanks! Stefano
if (bytes < 0) bytes = 0; -- 2.34.1