Re: [PATCH bpf] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
From: Michal Luczaj <hidden>
Date: 2026-01-30 11:00:31
Also in:
bpf, lkml
On 1/29/26 20:41, Martin KaFai Lau wrote:
On 1/29/26 8:47 AM, Michal Luczaj wrote:quoted
BPF_MAP_UPDATE_ELEM races unix_stream_connect(): when sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED), unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL. T0 bpf T1 connect ------ ---------- WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) sock_map_sk_state_allowed(sk) ... sk_pair = unix_peer(sk) sock_hold(sk_pair) sock_hold(newsk) smp_mb__after_atomic() unix_peer(sk) = newsk BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 Call Trace: sock_map_link+0x564/0x8b0 sock_map_update_common+0x6e/0x340 sock_map_update_elem_sys+0x17d/0x240 __sys_bpf+0x26db/0x3250 __x64_sys_bpf+0x21/0x30 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Follow-up to discussion at https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/ (local).It is a long thread to dig. Please summarize the discussion in the commit message.
OK, there we go: The root cause of the null-ptr-deref is that unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that socket is properly set up, which would include having a defined peer. In other words, there's a window when you can call unix_stream_bpf_update_proto() on socket which still has unix_peer(sk) == NULL. My initial idea was to simply move peer assignment _before_ the sk_state update, but the maintainer wasn't interested in changing the unix_stream_connect() hot path. He suggested taking care of it in the sockmap code. My understanding is that users are not supposed to put sockets in a sockmap when said socket is only half-way through connect() call. Hence `return -EINVAL` on a missing peer. Now, if users should be allowed to legally race connect() vs. sockmap update, then I guess we can wait for connect() to "finalize" e.g. by taking the unix_state_lock(), as discussed below.
From looking at this commit message, if the existing lock_sock held by update_elem is not useful for af_unix,
Right, the existing lock_sock is not useful. update's lock_sock holds sock::sk_lock, while unix_state_lock() holds unix_sock::lock.
it is not clear why a new test "!sk_pair" on top of the existing WRITE_ONCE(sk->sk_state...) is a fix.
"On top"? Just to make sure we're looking at the same thing: above I was trying to show two parallel flows with unix_peer() fetch in thread-0 and WRITE_ONCE(sk->sk_state...) and `unix_peer(sk) = newsk` in thread-1. It fixes the problem because now update_proto won't call sock_hold(NULL).
A minor thing is sock_map_sk_state_allowed doesn't have READ_ONCE(sk->sk_state) for sk_is_stream_unix also.
Ok, I'll add this as a separate patch in v2. Along with the !tcp case of sock_map_redirect_allowed()?
If unix_stream_connect does not hold lock_sock, can unix_state_lock be used here? lock_sock has already been taken, update_elem should not be the hot path.
Yes, it can be used, it was proposed in the old thread. In fact, critical
section can be empty; only used to wait for unix_stream_connect() to
release the lock, which would guarantee unix_peer(sk) != NULL by then.
if (!psock->sk_pair) {
+ unix_state_lock(sk);
+ unix_state_unlock(sk);
sk_pair = unix_peer(sk);
sock_hold(sk_pair);
quoted
Fixes: 8866730aed51 ("bpf, sockmap: af_unix stream sockets need to hold ref for pair sock") Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Michal Luczaj <redacted> --- Re-triggered while working on an unrelated selftest: https://lore.kernel.org/bpf/20260123-selftest-signal-on-connect-v1-0-b0256e7025b6@rbox.co/ (local) --- net/unix/unix_bpf.c | 3 +++ 1 file changed, 3 insertions(+)diff --git a/net/unix/unix_bpf.c b/net/unix/unix_bpf.c index e0d30d6d22ac..57f3124c9d8d 100644 --- a/net/unix/unix_bpf.c +++ b/net/unix/unix_bpf.c@@ -185,6 +185,9 @@ int unix_stream_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool r */ if (!psock->sk_pair) { sk_pair = unix_peer(sk); + if (unlikely(!sk_pair)) + return -EINVAL; + sock_hold(sk_pair); psock->sk_pair = sk_pair; }--- base-commit: 63804fed149a6750ffd28610c5c1c98cce6bd377 change-id: 20260129-unix-proto-update-null-ptr-deref-6a2733bcbbf8 Best regards,