Thread (11 messages) 11 messages, 5 authors, 2025-09-03

Re: [PATCH net] inet: Avoid established lookup missing active sk

From: Jason Xing <hidden>
Date: 2025-09-03 06:53:34

On Wed, Sep 3, 2025 at 2:40 PM Eric Dumazet [off-list ref] wrote:
On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo [off-list ref] wrote:
quoted
From: Xuanqiang Luo <redacted>

Since the lookup of sk in ehash is lockless, when one CPU is performing a
lookup while another CPU is executing delete and insert operations
(deleting reqsk and inserting sk), the lookup CPU may miss either of
them, if sk cannot be found, an RST may be sent.

The call trace map is drawn as follows:
   CPU 0                           CPU 1
   -----                           -----
                                spin_lock()
                                sk_nulls_del_node_init_rcu(osk)
__inet_lookup_established()
                                __sk_nulls_add_node_rcu(sk, list)
                                spin_unlock()

We can try using spin_lock()/spin_unlock() to wait for ehash updates
(ensuring all deletions and insertions are completed) after a failed
lookup in ehash, then lookup sk again after the update. Since the sk
expected to be found is unlikely to encounter the aforementioned scenario
multiple times consecutively, we only need one update.
No need for a lock really...
- add the new node (with a temporary 'wrong' nulls value),
- delete the old node
- replace the nulls value by the expected one.
Yes. The plan is simple enough to fix this particular issue and I
verified in production long ago. Sadly the following patch got
reverted...
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3f4ca5fafc08881d7a57daa20449d171f2887043

Thanks,
Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help