Re: [PATCH] net: procfs: Fix RCU stall and soft lockup in ptype_seq_next()
From: YinFengwei <hidden>
Date: 2026-02-02 01:04:33
Also in:
lkml
Hi Eric,
On Sat, Jan 31, 2026 at 6:50 PM Eric Dumazet [off-list ref] wrote:quoted
On Sat, Jan 31, 2026 at 6:41 PM Willem de Bruijn [off-list ref] wrote:quoted
Jakub Kicinski wrote:quoted
On Wed, 28 Jan 2026 15:03:59 +0800 fengwei_yin@linux.alibaba.com wrote:quoted
The root cause is in ptype_seq_next(): when iterating over packet types, it's possible that a packet type entry (pt) has been removed, its dev set to NULL, and pt->af_packet_net is not initialized. In that case, the function may return the same 'nxt' pointer indefinitely. This results in an infinite loop under RCU read-side critical section, causing an RCU stall and eventually a soft lockup. Fix the issue by properly handling the case where 'nxt' points to an empty list, ensuring forward progress in the iterator.quoted
@@ -247,7 +247,7 @@ static void *ptype_seq_next(struct seq_file *seq, void *v, loff_t *pos) if (pt->af_packet_net) { net_ptype_all: - if (nxt != &net->ptype_all && nxt != &net->ptype_specific) + if (!list_empty(nxt) && nxt != &net->ptype_all && nxt != &net->ptype_specific) goto found; if (nxt == &net->ptype_all) {@@ -267,6 +267,9 @@ static void *ptype_seq_next(struct seq_file *seq, void *v, loff_t *pos) return NULL; nxt = ptype_base[hash].next; } + + if (list_empty(nxt)) + return NULL; found: return list_entry(nxt, struct packet_type, list); }I'm not sure this fix works, TBH, we're dealing with an RCU list here. The elements are not deleted with list_del_init(), so they won't look "empty". If the pt entries are under RCU protection I think the issue is that af_packet is clearing pt->dev before waiting for the grace period to expire. Willem, is there a reason for that or just convenience?That would be wrong. Do we see it doing that somewhere? These handlers should get removed with dev_remove_pack. Or __dev_remove_pack and observe the RCU grace period some other way. I can review these, but was not aware of any abuses.packet_notifier() case NETDEV_DOWN: if (dev->ifindex == po->ifindex) { spin_lock(&po->bind_lock); if (packet_sock_flag(po, PACKET_SOCK_RUNNING)) { __unregister_prot_hook(sk, false); /* removed without a synchronize_rcu() */ sk->sk_err = ENETDOWN; if (!sock_flag(sk, SOCK_DEAD)) sk_error_report(sk); } if (msg == NETDEV_UNREGISTER) { packet_cached_dev_reset(po); WRITE_ONCE(po->ifindex, -1); netdev_put(po->prot_hook.dev, &po->prot_hook.dev_tracker); po->prot_hook.dev = NULL; // pointer set to NULL
Yes. This line is the main problem which trigger the rcu stall.
quoted
} spin_unlock(&po->bind_lock); } break;And other places as well... I would suggest adding proper RCU protection to prot_hook.dev
Agree. Using RCU to protect prot_hook.dev is the best fix. I saw you sent the fixing patch already. Will give it a try and report back. Thanks. Regards Yin, Fengwei