Thread (11 messages) 11 messages, 5 authors, 2025-09-03

Re: [PATCH net] inet: Avoid established lookup missing active sk

From: luoxuanqiang <hidden>
Date: 2025-09-03 07:55:25

在 2025/9/3 13:48, Kuniyuki Iwashima 写道:
On Tue, Sep 2, 2025 at 10:16 PM Kuniyuki Iwashima [off-list ref] wrote:
quoted
On Tue, Sep 2, 2025 at 7:45 PM Xuanqiang Luo [off-list ref] wrote:
quoted
From: Xuanqiang Luo <redacted>

Since the lookup of sk in ehash is lockless, when one CPU is performing a
lookup while another CPU is executing delete and insert operations
(deleting reqsk and inserting sk), the lookup CPU may miss either of
them, if sk cannot be found, an RST may be sent.

The call trace map is drawn as follows:
    CPU 0                           CPU 1
    -----                           -----
                                 spin_lock()
                                 sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
                                 __sk_nulls_add_node_rcu(sk, list)
                                 spin_unlock()
This usually does not happen except for local communication, and
retrying on the client side is much better than penalising all lookups
for SYN.
quoted
We can try using spin_lock()/spin_unlock() to wait for ehash updates
(ensuring all deletions and insertions are completed) after a failed
lookup in ehash, then lookup sk again after the update. Since the sk
expected to be found is unlikely to encounter the aforementioned scenario
multiple times consecutively, we only need one update.

Similarly, an issue occurs in tw hashdance. Try adjusting the order in
which it operates on ehash: remove sk first, then add tw. If sk is missed
during lookup, it will likewise wait for the update to find tw, without
worrying about the skc_refcnt issue that would arise if tw were found
first.

Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls")
Signed-off-by: Xuanqiang Luo <redacted>
---
  net/ipv4/inet_hashtables.c    | 12 ++++++++++++
  net/ipv4/inet_timewait_sock.c |  9 ++++-----
  2 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ceeeec9b7290..4eb3a55b855b 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -505,6 +505,7 @@ struct sock *__inet_lookup_established(const struct net *net,
         unsigned int hash = inet_ehashfn(net, daddr, hnum, saddr, sport);
         unsigned int slot = hash & hashinfo->ehash_mask;
         struct inet_ehash_bucket *head = &hashinfo->ehash[slot];
+       bool try_lock = true;

  begin:
         sk_nulls_for_each_rcu(sk, node, &head->chain) {
@@ -528,6 +529,17 @@ struct sock *__inet_lookup_established(const struct net *net,
          */
         if (get_nulls_value(node) != slot)
                 goto begin;
+
+       if (try_lock) {
+               spinlock_t *lock = inet_ehash_lockp(hashinfo, hash);
+
+               try_lock = false;
+               spin_lock(lock);
+               /* Ensure ehash ops under spinlock complete. */
+               spin_unlock(lock);
+               goto begin;
+       }
+
  out:
         sk = NULL;
  found:
diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c
index 875ff923a8ed..a91e02e19c53 100644
--- a/net/ipv4/inet_timewait_sock.c
+++ b/net/ipv4/inet_timewait_sock.c
@@ -139,14 +139,10 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,

         spin_lock(lock);

-       /* Step 2: Hash TW into tcp ehash chain */
-       inet_twsk_add_node_rcu(tw, &ehead->chain);
You are adding a new RST scenario where the corresponding
socket is not found and a listener or no socket is found.

The try_lock part is not guaranteed to happen after twsk
insertion below.
I'm sorry, I didn't get it. If try_lock can search again, other places 
should have left the spinlock critical section. That means twsk should 
have been inserted already. Or maybe I missed some details. Could you 
please explain more clearly? Your guidance would be really helpful. :)
Oh no, spin_lock() dance sychronises the threads but I still
think this is rather harmful for normal cases; now sending
an unmatched packet can trigger lock dance, which is easily
abused for DDoS.
I agree this scenario is possible. It does make DDoS attacks more 
impactful. Thanks for pointing that out. However, this is the best 
approach I can think of right now. Thanks Xuanqiang
quoted
quoted
-
-       /* Step 3: Remove SK from hash chain */
+       /* Step 2: Remove SK from hash chain */
         if (__sk_nulls_del_node_init_rcu(sk))
                 sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1);

-
         /* Ensure above writes are committed into memory before updating the
          * refcount.
          * Provides ordering vs later refcount_inc().
@@ -161,6 +157,9 @@ void inet_twsk_hashdance_schedule(struct inet_timewait_sock *tw,
          */
         refcount_set(&tw->tw_refcnt, 3);

+       /* Step 3: Hash TW into tcp ehash chain */
+       inet_twsk_add_node_rcu(tw, &ehead->chain);
+
         inet_twsk_schedule(tw, timeo);

         spin_unlock(lock);
--
2.25.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help