Re: [PATCH v2 net 01/15] af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
From: Michal Luczaj <hidden>
Date: 2024-06-19 18:15:23
On 6/17/24 20:21, Kuniyuki Iwashima wrote:
From: Michal Luczaj <redacted> Date: Mon, 17 Jun 2024 01:28:52 +0200quoted
(...) Another AF_UNIX sockmap issue is with OOB. When OOB packet is sent, skb is added to recv queue, but also u->oob_skb is set. Here's the problem: when this skb goes through bpf_sk_redirect_map() and is moved between socks, oob_skb remains set on the original sock.Good catch!quoted
[ 23.688994] WARNING: CPU: 2 PID: 993 at net/unix/garbage.c:351 unix_collect_queue+0x6c/0xb0 [ 23.689019] CPU: 2 PID: 993 Comm: kworker/u32:13 Not tainted 6.10.0-rc2+ #137 [ 23.689021] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 [ 23.689024] Workqueue: events_unbound __unix_gc [ 23.689027] RIP: 0010:unix_collect_queue+0x6c/0xb0 I wanted to write a patch, but then I realized I'm not sure what's the expected behaviour. Should the oob_skb setting follow to the skb's new sock or should it be dropped (similarly to what is happening today with scm_fp_list, i.e. redirect strips inflights)?The former will require large refactoring as we need to check if the redirect happens for BPF_F_INGRESS and if the redirected sk is also SOCK_STREAM etc. So, I'd go with the latter. Probably we can check if skb is u->oob_skb and drop OOB data and retry next in unix_stream_read_skb(), and forbid MSG_OOB in unix_bpf_recvmsg(). (...)
Yeah, sounds reasonable. I'm just not sure I understand the retry part. For
each skb_queue_tail() there's one ->sk_data_ready() (which does
->read_skb()). Why bother with a retry?
This is what I was thinking:
static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
{
+ struct unix_sock *u = unix_sk(sk);
+ struct sk_buff *skb;
+ int err;
+
if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED))
return -ENOTCONN;
- return unix_read_skb(sk, recv_actor);
+ mutex_lock(&u->iolock);
+ skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
+
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+ if (skb) {
+ bool drop = false;
+
+ spin_lock(&sk->sk_receive_queue.lock);
+ if (skb == u->oob_skb) {
+ WRITE_ONCE(u->oob_skb, NULL);
+ drop = true;
+ }
+ spin_unlock(&sk->sk_receive_queue.lock);
+
+ if (drop) {
+ WARN_ON_ONCE(skb_unref(skb));
+ kfree_skb(skb);
+ skb = NULL;
+ err = 0;
+ }
+ }
+#endif
+
+ mutex_unlock(&u->iolock);
+ return skb ? recv_actor(sk, skb) : err;
}