Thread (15 messages) 15 messages, 4 authors, 2026-03-03

Re: [PATCH] vsock: Enable H2G override

From: Stefano Garzarella <sgarzare@redhat.com>
Date: 2026-03-02 16:26:12
Also in: kvm, lkml, virtualization

On Mon, Mar 02, 2026 at 04:48:33PM +0100, Alexander Graf wrote:
On 02.03.26 13:06, Stefano Garzarella wrote:
quoted
CCing Bryan, Vishnu, and Broadcom list.

On Mon, Mar 02, 2026 at 12:47:05PM +0100, Stefano Garzarella wrote:
quoted
Please target net-next tree for this new feature.

On Mon, Mar 02, 2026 at 10:41:38AM +0000, Alexander Graf wrote:
quoted
Vsock maintains a single CID number space which can be used to
communicate to the host (G2H) or to a child-VM (H2G). The current logic
trivially assumes that G2H is only relevant for CID <= 2 because these
target the hypervisor.  However, in environments like Nitro 
Enclaves, an
instance that hosts vhost_vsock powered VMs may still want to 
communicate
to Enclaves that are reachable at higher CIDs through virtio-vsock-pci.

That means that for CID > 2, we really want an overlay. By default, all
CIDs are owned by the hypervisor. But if vhost registers a CID, 
it takes
precedence.  Implement that logic. Vhost already knows which CIDs it
supports anyway.

With this logic, I can run a Nitro Enclave as well as a nested VM with
vhost-vsock support in parallel, with the parent instance able to
communicate to both simultaneously.
I honestly don't understand why VMADDR_FLAG_TO_HOST (added 
specifically for Nitro IIRC) isn't enough for this scenario and we 
have to add this change.  Can you elaborate a bit more about the 
relationship between this change and VMADDR_FLAG_TO_HOST we added?

The main problem I have with VMADDR_FLAG_TO_HOST for connect() is that 
it punts the complexity to the user. Instead of a single CID address 
space, you now effectively create 2 spaces: One for TO_HOST (needs a 
flag) and one for TO_GUEST (no flag). But every user space tool needs 
to learn about this flag. That may work for super special-case 
applications. But propagating that all the way into socat, iperf, etc 
etc? It's just creating friction.
Okay, I would like to have this (or part of it) in the commit message to 
better explain why we want this change.
IMHO the most natural experience is to have a single CID space, 
potentially manually segmented by launching VMs of one kind within a 
certain range.
I see, but at this point, should the kernel set VMADDR_FLAG_TO_HOST in 
the remote address if that path is taken "automagically" ?

So in that way the user space can have a way to understand if it's 
talking with a nested guest or a sibling guest.


That said, I'm concerned about the scenario where an application does 
not even consider communicating with a sibling VM.

Until now, it knew that by not setting that flag, it could only talk to 
nested VMs, so if there was no VM with that CID, the connection simply 
failed. Whereas from this patch onwards, if the device in the host 
supports sibling VMs and there is a VM with that CID, the application 
finds itself talking to a sibling VM instead of a nested one, without 
having any idea.

Should we make this feature opt-in in some way, such as sockopt or 
sysctl? (I understand that there is the previous problem, but honestly, 
it seems like a significant change to the behavior of AF_VSOCK).
At the end of the day, the host vs guest problem is super similar to a 
routing table.
Yeah, but the point of AF_VSOCK is precisely to avoid complexities such 
as routing tables as much as possible; otherwise, AF_INET is already 
there and ready to be used. In theory, we only want communication 
between host and guest.
quoted
quoted
quoted
Signed-off-by: Alexander Graf <graf@amazon.com>
---
drivers/vhost/vsock.c    | 11 +++++++++++
include/net/af_vsock.h   |  3 +++
net/vmw_vsock/af_vsock.c |  3 +++
3 files changed, 17 insertions(+)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 054f7a718f50..223da817e305 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -91,6 +91,16 @@ static struct vhost_vsock 
*vhost_vsock_get(u32 guest_cid, struct net *net)
   return NULL;
}

+static bool vhost_transport_has_cid(u32 cid)
+{
+    bool found;
+
+    rcu_read_lock();
+    found = vhost_vsock_get(cid) != NULL;
We recently added namespaces support that changed 
vhost_vsock_get() params. This is also in net tree now and in 
Linus' tree, so not sure where this patch is based, but this needs 
to be rebased since it is not building:

../drivers/vhost/vsock.c: In function ‘vhost_transport_has_cid’:
../drivers/vhost/vsock.c:99:17: error: too few arguments to 
function ‘vhost_vsock_get’; expected 2, have 1
 99 |         found = vhost_vsock_get(cid) != NULL;
    |                 ^~~~~~~~~~~~~~~
../drivers/vhost/vsock.c:74:28: note: declared here
 74 | static struct vhost_vsock *vhost_vsock_get(u32 guest_cid, 
struct net *net)
    |

D'oh. Sorry, I built this on 6.19 and only realized after the send 
that namespace support got in. Will fix up for v2.
Thanks.
quoted
quoted
quoted
+    rcu_read_unlock();
+    return found;
+}
+
static void
vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
               struct vhost_virtqueue *vq)
@@ -424,6 +434,7 @@ static struct virtio_transport vhost_transport = {
       .module                   = THIS_MODULE,

       .get_local_cid            = vhost_transport_get_local_cid,
+        .has_cid                  = vhost_transport_has_cid,

       .init                     = virtio_transport_do_socket_init,
       .destruct                 = virtio_transport_destruct,
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index 533d8e75f7bb..4cdcb72f9765 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -179,6 +179,9 @@ struct vsock_transport {
   /* Addressing. */
   u32 (*get_local_cid)(void);

+    /* Check if this transport serves a specific remote CID. */
+    bool (*has_cid)(u32 cid);
What about "has_remote_cid" ?
quoted
+
   /* Read a single skb */
   int (*read_skb)(struct vsock_sock *, skb_read_actor_t);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 2f7d94d682cb..8b34b264b246 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -584,6 +584,9 @@ int vsock_assign_transport(struct vsock_sock 
*vsk, struct vsock_sock *psk)
       else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g ||
            (remote_flags & VMADDR_FLAG_TO_HOST))
           new_transport = transport_g2h;
+        else if (transport_h2g->has_cid &&
+             !transport_h2g->has_cid(remote_cid))
+            new_transport = transport_g2h;
We should update the comment on top of this fuction, and maybe 
also try to support the other H2G transport (i.e. VMCI).

@Bryan @Vishnu can the new has_cid()/has_remote_cid() be supported 
by VMCI too?
Oops, I forgot to CC them, now they should be in copy.

Ack. I can also take a quick look if it's trivial to add.
Great, thanks for that!

Stefano
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help