Thread (6 messages) 6 messages, 3 authors, 2026-02-19

Re: [PATCH net] vsock: lock down child_ns_mode as write-once

From: Bobby Eshleman <hidden>
Date: 2026-02-18 16:15:42
Also in: kvm, linux-kselftest, lkml, virtualization

On Wed, Feb 18, 2026 at 11:43:42AM +0100, Stefano Garzarella wrote:
On Wed, Feb 18, 2026 at 11:02:02AM +0100, Stefano Garzarella wrote:
quoted
On Tue, Feb 17, 2026 at 05:45:10PM -0800, Bobby Eshleman wrote:
quoted
From: Bobby Eshleman <redacted>

To improve the security posture of vsock namespacing, this patch locks
down the vsock child_ns_mode sysctl setting with a write-once policy.
The user may write to child_ns_mode only once in each namespace, making
changes to either local or global mode be irreversible.

This avoids security breaches where a process in a local namespace may
attempt to jailbreak into the global vsock ns space by setting
child_ns_mode to "global", creating a new namespace, and accessing the
global space through the new namespace.
Commit 6a997f38bdf8 ("vsock: prevent child netns mode switch from local
to global") should avoid exactly that, so I don't get this. Can you
elaborate more how this can happen without this patch?

I think here we should talk more about what we described in
https://lore.kernel.org/netdev/aZNNBc390y6V09qO@sgarzare-redhat/ (local) which
is that two administrator processes could compete in setting
`child_ns_mode` and end up creating a namespace with an `ns_mode`
different from the one set in `child_ns_mode`. But I would also explain
that this can still be detected by the process by looking at `ns_mode`
and trying again.  With this patch, we avoid this and allow the
namespace manager to set it once and be sure that it cannot be changed
again.
Oh that's right, I lost track of the original intent when writing this.
quoted
quoted
Additionally, fix the test functions that this change would otherwise
break by adding "global-parent" and "local-parent" namespaces and using
them as intermediaries to spawn namespaces in the given modes. This
avoids the need to change "child_ns_mode" in the init_ns. nsenter must
be used because ip netns unshares the mount namespace so nested "ip
netns add" breaks exec calls from the init ns.
I'm not sure what the policy is in netdev, but I would prefer to have
selftest changes in another patch (I think earlier in the series so as
not to break the bisection), in order to simplify backporting (e.g. in
CentOS Stream, to keep the backport small, I didn't backport the dozens
of patches for selftest that we did previously).
Sounds good! I wasn't sure if breakage so tightly coupled should be in
the same patch or not, I'm happy to split it up to ease backporting.
quoted
Obviously, if it's not possible and breaks the bisection, I can safely
skip these changes during the backport.
quoted
Test run:

1..25
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_host_vsock_ns_mode_ok
ok 5 ns_host_vsock_child_ns_mode_ok
ok 6 ns_global_same_cid_fails
ok 7 ns_local_same_cid_ok
ok 8 ns_global_local_same_cid_ok
ok 9 ns_local_global_same_cid_ok
ok 10 ns_diff_global_host_connect_to_global_vm_ok
ok 11 ns_diff_global_host_connect_to_local_vm_fails
ok 12 ns_diff_global_vm_connect_to_global_host_ok
ok 13 ns_diff_global_vm_connect_to_local_host_fails
ok 14 ns_diff_local_host_connect_to_local_vm_fails
ok 15 ns_diff_local_vm_connect_to_local_host_fails
ok 16 ns_diff_global_to_local_loopback_local_fails
ok 17 ns_diff_local_to_global_loopback_fails
ok 18 ns_diff_local_to_local_loopback_fails
ok 19 ns_diff_global_to_global_loopback_ok
ok 20 ns_same_local_loopback_ok
ok 21 ns_same_local_host_connect_to_local_vm_ok
ok 22 ns_same_local_vm_connect_to_local_host_ok
ok 23 ns_delete_vm_ok
ok 24 ns_delete_host_ok
ok 25 ns_delete_both_ok
SUMMARY: PASS=25 SKIP=0 FAIL=0
IMO this can be removed from the commit message, doesn't add much value
other than say that all test passes.
sgtm!
quoted
quoted
Fixes: eafb64f40ca4 ("vsock: add netns to vsock core")
Signed-off-by: Bobby Eshleman <redacted>
Suggested-by: Daan De Meyer <redacted>
Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
---
include/net/af_vsock.h                  |  6 +++++-
include/net/netns/vsock.h               |  1 +
net/vmw_vsock/af_vsock.c                | 10 ++++++----
tools/testing/selftests/vsock/vmtest.sh | 35
+++++++++++++++------------------
4 files changed, 28 insertions(+), 24 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
index d3ff48a2fbe0..c7de33039907 100644
--- a/include/net/af_vsock.h
+++ b/include/net/af_vsock.h
@@ -276,10 +276,14 @@ static inline bool vsock_net_mode_global(struct vsock_sock *vsk)
	return vsock_net_mode(sock_net(sk_vsock(vsk))) == VSOCK_NET_MODE_GLOBAL;
}

-static inline void vsock_net_set_child_mode(struct net *net,
+static inline bool vsock_net_set_child_mode(struct net *net,
					    enum vsock_net_mode mode)
{
+	if (xchg(&net->vsock.child_ns_mode_locked, 1))
+		return false;
+
	WRITE_ONCE(net->vsock.child_ns_mode, mode);
+	return true;
}

static inline enum vsock_net_mode vsock_net_child_mode(struct net *net)
diff --git a/include/net/netns/vsock.h b/include/net/netns/vsock.h
index b34d69a22fa8..8c855fff8039 100644
--- a/include/net/netns/vsock.h
+++ b/include/net/netns/vsock.h
@@ -17,5 +17,6 @@ struct netns_vsock {
	enum vsock_net_mode mode;
	enum vsock_net_mode child_ns_mode;
+	int child_ns_mode_locked;
};
#endif /* __NET_NET_NAMESPACE_VSOCK_H */
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 9880756d9eff..35e097f4fde8 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -90,14 +90,15 @@
*
*   - /proc/sys/net/vsock/ns_mode (read-only) reports the current namespace's
*     mode, which is set at namespace creation and immutable thereafter.
- *   - /proc/sys/net/vsock/child_ns_mode (writable) controls what mode future
+ *   - /proc/sys/net/vsock/child_ns_mode (write-once) controls what mode future
*     child namespaces will inherit when created. The initial value matches
*     the namespace's own ns_mode.
*
*   Changing child_ns_mode only affects newly created namespaces, not the
*   current namespace or existing children. A "local" namespace cannot set
- *   child_ns_mode to "global". At namespace creation, ns_mode is inherited
- *   from the parent's child_ns_mode.
+ *   child_ns_mode to "global". child_ns_mode is write-once, so that it may
+ *   be configured and locked down by a namespace manager. At namespace
+ *   creation, ns_mode is inherited from the parent's child_ns_mode.
We just merged commit a07c33c6f2fc ("vsock: document namespace mode
sysctls") in the net tree, so we should update also
Documentation/admin-guide/sysctl/net.rst
Indeed.
quoted
quoted
*
*   The init_net mode is "global" and cannot be modified.
Maybe we should also emphasise in the documentation and in the commit
description that `child_ns_mode` in `init_net` also is write-once, so
writing `local` to that one by the init process (e.g. systemd),
essentially will make all the new namespaces in `local` mode. This could
be useful for container/namespace managers.
Sounds good.
quoted
quoted
*
@@ -2853,7 +2854,8 @@ static int vsock_net_child_mode_string(const struct ctl_table *table, int write,
		    new_mode == VSOCK_NET_MODE_GLOBAL)
			return -EPERM;

-		vsock_net_set_child_mode(net, new_mode);
+		if (!vsock_net_set_child_mode(net, new_mode))
+			return -EPERM;
So, if `child_ns_mode` is set to `local` but locked, writing `local`
again will return -EPERM, is this really what we want?

I'm not sure if we can relax it a bit, but then we may race between
reader and writer, so maybe it's fine like it is in this patch, but we
should document better that any writes (even same value) after the first
one will return -EPERM.
I think we can try in this way:

static inline bool vsock_net_set_child_mode(struct net *net,
					    enum vsock_net_mode mode)
{
	int new_locked = mode + 1;
	int old_locked = 0;

	if (try_cmpxchg(&net->vsock.child_ns_mode_locked,
			&old_locked, new_locked)) {
		WRITE_ONCE(net->vsock.child_ns_mode, mode);
		return true;
	}

	return old_locked == new_locked;
}

With a comment like this near child_ns_mode_locked in struct netns_vsock:
/* 0 = unlocked
 * 1 = locked to GLOBAL (VSOCK_NET_MODE_GLOBAL + 1)
 * 2 = locked to LOCAL  (VSOCK_NET_MODE_LOCAL + 1)
 */
This looks good to me. Will change.
While writing that, I thought that we can even remove `child_ns_mode_locked`
and use a single variable where encode the value and the state, but maybe
it's an unnecessary extra complexity.

Stefano
quoted
About that, should we return something different, like -EBUSY ?
Not a strong opinion, just to differentiate with the other check before.

Thanks,
Stefano
Differentiating with -EBUSY sounds useful. Will add that and document.

Best,
Bobby
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help