Re: [PATCH bpf-next 4/8] bpf, sockmap: Don't let child socket inherit psock or its ops on copy
From: Martin Lau <hidden>
Date: 2019-11-26 17:17:33
Also in:
bpf
On Tue, Nov 26, 2019 at 04:54:33PM +0100, Jakub Sitnicki wrote:
On Mon, Nov 25, 2019 at 11:38 PM CET, Martin Lau wrote:quoted
On Sat, Nov 23, 2019 at 12:07:47PM +0100, Jakub Sitnicki wrote: [ ... ]quoted
@@ -370,6 +378,11 @@ static inline void sk_psock_restore_proto(struct sock *sk, sk->sk_prot = psock->sk_proto; psock->sk_proto = NULL; } + + if (psock->icsk_af_ops) { + icsk->icsk_af_ops = psock->icsk_af_ops; + psock->icsk_af_ops = NULL; + } }[ ... ]quoted
+static struct sock *tcp_bpf_syn_recv_sock(const struct sock *sk, + struct sk_buff *skb, + struct request_sock *req, + struct dst_entry *dst, + struct request_sock *req_unhash, + bool *own_req) +{ + const struct inet_connection_sock_af_ops *ops; + void (*write_space)(struct sock *sk); + struct sk_psock *psock; + struct proto *proto; + struct sock *child; + + rcu_read_lock(); + psock = sk_psock(sk); + if (likely(psock)) { + proto = psock->sk_proto; + write_space = psock->saved_write_space; + ops = psock->icsk_af_ops;It is not immediately clear to me what ensure ops is not NULL here. It is likely I missed something. A short comment would be very useful here.I can see the readability problem. Looking at it now, perhaps it should be rewritten, to the same effect, as: static struct sock *tcp_bpf_syn_recv_sock(...) { const struct inet_connection_sock_af_ops *ops = NULL; ... rcu_read_lock(); psock = sk_psock(sk); if (likely(psock)) { proto = psock->sk_proto; write_space = psock->saved_write_space; ops = psock->icsk_af_ops; } rcu_read_unlock(); if (!ops) ops = inet_csk(sk)->icsk_af_ops; child = ops->syn_recv_sock(sk, skb, req, dst, req_unhash, own_req); If psock->icsk_af_ops were NULL, it would mean we haven't initialized it properly. To double check what happens here:
I did not mean the init path. The init path is fine since it init eveything on psock before publishing the sk to the sock_map. I was thinking the delete path (e.g. sock_map_delete_elem). It is not clear to me what prevent the earlier pasted sk_psock_restore_proto() which sets psock->icsk_af_ops to NULL from running in parallel with tcp_bpf_syn_recv_sock()? An explanation would be useful.
In sock_map_link we do a setup dance where we first create the psock and
later initialize the socket callbacks (tcp_bpf_init).
static int sock_map_link(struct bpf_map *map, struct sk_psock_progs *progs,
struct sock *sk)
{
...
if (psock) {
...
} else {
psock = sk_psock_init(sk, map->numa_node);
if (!psock) {
ret = -ENOMEM;
goto out_progs;
}
sk_psock_is_new = true;
}
...
if (sk_psock_is_new) {
ret = tcp_bpf_init(sk);
if (ret < 0)
goto out_drop;
} else {
tcp_bpf_reinit(sk);
}
The "if (sk_psock_new)" branch triggers the call chain that leads to
saving & overriding socket callbacks.
tcp_bpf_init -> tcp_bpf_update_sk_prot -> sk_psock_update_proto
Among them, icsk_af_ops.
static inline void sk_psock_update_proto(...)
{
...
psock->icsk_af_ops = icsk->icsk_af_ops;
icsk->icsk_af_ops = af_ops;
}
Goes without saying that a comment is needed.
Thanks for the feedback,
Jakub