Re: [PATCH net] inet: Avoid established lookup missing active sk
From: Jason Xing <hidden>
Date: 2025-09-03 06:53:34
On Wed, Sep 3, 2025 at 2:40 PM Eric Dumazet [off-list ref] wrote:
On Tue, Sep 2, 2025 at 7:46 PM Xuanqiang Luo [off-list ref] wrote:quoted
From: Xuanqiang Luo <redacted> Since the lookup of sk in ehash is lockless, when one CPU is performing a lookup while another CPU is executing delete and insert operations (deleting reqsk and inserting sk), the lookup CPU may miss either of them, if sk cannot be found, an RST may be sent. The call trace map is drawn as follows: CPU 0 CPU 1 ----- ----- spin_lock() sk_nulls_del_node_init_rcu(osk) __inet_lookup_established() __sk_nulls_add_node_rcu(sk, list) spin_unlock() We can try using spin_lock()/spin_unlock() to wait for ehash updates (ensuring all deletions and insertions are completed) after a failed lookup in ehash, then lookup sk again after the update. Since the sk expected to be found is unlikely to encounter the aforementioned scenario multiple times consecutively, we only need one update.No need for a lock really... - add the new node (with a temporary 'wrong' nulls value), - delete the old node - replace the nulls value by the expected one.
Yes. The plan is simple enough to fix this particular issue and I verified in production long ago. Sadly the following patch got reverted... https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3f4ca5fafc08881d7a57daa20449d171f2887043 Thanks, Jason