Thread (24 messages) 24 messages, 3 authors, 2026-02-07

Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update

From: Michal Luczaj <hidden>
Date: 2026-01-30 11:00:31
Also in: bpf, lkml

On 1/29/26 20:41, Martin KaFai Lau wrote:
On 1/29/26 8:47 AM, Michal Luczaj wrote:
quoted
BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when
sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.

	T0 bpf				T1 connect
	------				----------

				WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)
sock_map_sk_state_allowed(sk)
...
sk_pair = unix_peer(sk)
sock_hold(sk_pair)
				sock_hold(newsk)
				smp_mb__after_atomic()
				unix_peer(sk) = newsk

BUG: kernel NULL pointer dereference, address: 0000000000000080
RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0
Call Trace:
  sock_map_link+0x564/0x8b0
  sock_map_update_common+0x6e/0x340
  sock_map_update_elem_sys+0x17d/0x240
  __sys_bpf+0x26db/0x3250
  __x64_sys_bpf+0x21/0x30
  do_syscall_64+0x6b/0x3a0
  entry_SYSCALL_64_after_hwframe+0x76/0x7e

Follow-up to discussion at
https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/ (local).
It is a long thread to dig. Please summarize the discussion in the 
commit message.
OK, there we go:

The root cause of the null-ptr-deref is that unix_stream_connect() sets
sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns
a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes
sock_map_sk_state_allowed() believe that socket is properly set up, which
would include having a defined peer.

In other words, there's a window when you can call
unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL.

My initial idea was to simply move peer assignment _before_ the sk_state
update, but the maintainer wasn't interested in changing the
unix_stream_connect() hot path. He suggested taking care of it in the
sockmap code.

My understanding is that users are not supposed to put sockets in a sockmap
when said socket is only half-way through connect() call. Hence `return
-EINVAL` on a missing peer. Now, if users should be allowed to legally race
connect() vs. sockmap update, then I guess we can wait for connect() to
"finalize" e.g. by taking the unix_state_lock(), as discussed below.
 From looking at this commit message, if the existing lock_sock held by 
update_elem is not useful for af_unix,
Right, the existing lock_sock is not useful. update's lock_sock holds
sock::sk_lock, while unix_state_lock() holds unix_sock::lock.
it is not clear why a new test 
"!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix. 
"On top"? Just to make sure we're looking at the same thing: above I was
trying to show two parallel flows with unix_peer() fetch in thread-0 and
WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1.

It fixes the problem because now update_proto won't call sock_hold(NULL).
A minor thing is sock_map_sk_state_allowed doesn't have 
READ_ONCE(sk->sk_state) for sk_is_stream_unix also.
Ok, I'll add this as a separate patch in v2. Along with the !tcp case of
sock_map_redirect_allowed()?
If unix_stream_connect does not hold lock_sock, can unix_state_lock be 
used here? lock_sock has already been taken, update_elem should not be 
the hot path.
Yes, it can be used, it was proposed in the old thread. In fact, critical
section can be empty; only used to wait for unix_stream_connect() to
release the lock, which would guarantee unix_peer(sk) != NULL by then.

        if (!psock->sk_pair) {
+               unix_state_lock(sk);
+               unix_state_unlock(sk);
                sk_pair = unix_peer(sk);
                sock_hold(sk_pair);
quoted
Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock")
Suggested-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Michal Luczaj <redacted>
---
Re-triggered while working on an unrelated selftest:
https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/ (local)
---
  net/unix/unix_bpf.c | 3 +++
  1 file changed, 3 insertions(+)
diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c
index e0d30d6d22ac..57f3124c9d8d 100644
--- a/net/unix/unix_bpf.c
+++ b/net/unix/unix_bpf.c
@@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r
  	 */
  	if (!psock->sk_pair) {
  		sk_pair = unix_peer(sk);
+		if (unlikely(!sk_pair))
+			return -EINVAL;
+
  		sock_hold(sk_pair);
  		psock->sk_pair = sk_pair;
  	}
---
base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377
change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8

Best regards,
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help