Thread (6 messages) 6 messages, 3 authors, 2025-02-04

Re: [syzbot] [net?] general protection fault in add_wait_queue

From: Michal Luczaj <hidden>
Date: 2025-02-04 00:38:58
Also in: kvm, lkml, virtualization
Subsystem: networking [general], the rest, vm sockets (af_vsock) · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds, Stefano Garzarella

On 2/3/25 10:57, syzbot wrote:
Hello,

syzbot found the following issue on:

HEAD commit:    c2933b2befe2 Merge tag 'net-6.14-rc1' of git://git.kernel...
git tree:       net-next
console output: https://syzkaller.appspot.com/x/log.txt?x=16f676b0580000
kernel config:  https://syzkaller.appspot.com/x/.config?x=d033b14aeef39158
dashboard link: https://syzkaller.appspot.com/bug?extid=9d55b199192a4be7d02c
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13300b24580000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12418518580000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/c7667ae12603/disk-c2933b2b.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/944ca63002c1/vmlinux-c2933b2b.xz
kernel image: https://storage.googleapis.com/syzbot-assets/30748115bf0b/bzImage-c2933b2b.xz

The issue was bisected to:

commit fcdd2242c0231032fc84e1404315c245ae56322a
Author: Michal Luczaj [off-list ref]
Date:   Tue Jan 28 13:15:27 2025 +0000

    vsock: Keep the binding until socket destruction
syzbot is correct (thanks), bisected commit introduced a regression.

sock_orphan(sk) is being called without taking into consideration that it
does `sk->sk_wq = NULL`. Later, if SO_LINGER is set, sk->sk_wq gets
dereferenced in virtio_transport_wait_close().

Repro, as shown by syzbot, is simply
from socket import *
lis = socket(AF_VSOCK, SOCK_STREAM)
lis.bind((1, 1234)) # VMADDR_CID_LOCAL
lis.listen()
s = socket(AF_VSOCK, SOCK_STREAM)
s.setsockopt(SOL_SOCKET, SO_LINGER, (1<<32) | 1)
s.connect(lis.getsockname())
s.close()

A way of fixing this is to put sock_orphan(sk) back where it was before the
breaking patch and instead explicitly flip just the SOCK_DEAD bit, i.e.
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index 075695173648..06250bb9afe2 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -824,13 +824,14 @@ static void __vsock_release(struct sock *sk, int level)
 	 */
 	lock_sock_nested(sk, level);
 
-	sock_orphan(sk);
+	sock_set_flag(sk, SOCK_DEAD);
 
 	if (vsk->transport)
 		vsk->transport->release(vsk);
 	else if (sock_type_connectible(sk->sk_type))
 		vsock_remove_sock(vsk);
 
+	sock_orphan(sk);
 	sk->sk_shutdown = SHUTDOWN_MASK;
 
 	skb_queue_purge(&sk->sk_receive_queue);
I'm not sure this is the most elegant code (sock_orphan(sk) sets SOCK_DEAD
on a socket that is already SOCK_DEAD), but here it goes:
https://lore.kernel.org/netdev/20250204-vsock-linger-nullderef-v1-0-6eb1760fa93e@rbox.co/ (local)

One more note: man socket(7) says lingering also happens on shutdown().
Should vsock follow that?

Thanks,
Michal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help