Re: Is bug 200755 in anyone's queue??
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2019-09-04 14:24:12
On Wed, Sep 4, 2019 at 8:23 AM Eric Dumazet [off-list ref] wrote:
On 9/4/19 2:00 PM, Mark KEATON wrote:quoted
Hi Willem, I am the person who commented on the original bug report in bugzilla. In communicating with Steve just now about possible solutions that maintain the efficiency that you are after, what would you think of the following: keep two lists of UDP sockets, those connected and those not connected, and always searching the connected list first.This was my suggestion. Note that this requires adding yet another hash table, and yet another lookup (another cache line miss per incoming packet) This lookup will slow down DNS and QUIC servers, or any application solely using not connected sockets.
Exactly. The only way around it that I see is to keep the single list and optionally mark a struct reuseport_sock as having no connected members, in which case the search can break on the first reuseport match, as it does today. " On top of the main patch it requires something like
@@ -22,6 +22,7 @@ struct sock_reuseport { /* ID stays the same even after the size of socks[] grows. */ unsigned int reuseport_id; bool bind_inany; + unsigned int connected; struct bpf_prog __rcu *prog; /* optional BPF sock selector */ struct sock *socks[0]; /* array of sock pointers */ };
@@ -73,6 +74,15 @@ int __ip4_datagram_connect(struct sock *sk, structsockaddr *uaddr, int addr_len
sk_set_txhash(sk);
inet->inet_id = jiffies;
+ if (rcu_access_pointer(sk->sk_reuseport_cb)) {
+ struct sock_reuseport *reuse;
+
+ rcu_read_lock();
+ reuse = rcu_dereference(sk->sk_reuseport_cb);
+ reuse->connected = 1;
+ rcu_read_unlock();
+ }
+
sk_dst_set(sk, &rt->dst);
err = 0;
"
plus a way for reuseport_select_sock to communicate that. Probably a
variant __reuseport_select_sock with an extra argument.
As for BPF: the example I pointed out does read ip addresses and uses
a BPF map for socket selection. But as that feature is new with 4.19
it is probably moot for this purpose, as we are targeting a fix that
can be backported to 4.19 stable.