Re: [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Eric Dumazet <edumazet@google.com>
Date: 2026-06-24 17:22:15
Also in:
lkml
On Wed, Jun 24, 2026 at 10:12 AM Pengfei Zhang [off-list ref] wrote:
From: Pengfei Zhang <redacted> inet6_dump_fib() saves its progress in cb->args[1] as a positional index within the current hash chain. Between batches the RTNL lock is released, so a concurrent fib6_new_table() can insert a new table at the chain head, shifting all existing entries. The saved index then lands on a different table, causing fib6_dump_table() to set w->root to the wrong table while w->node still points into the previous one. fib6_walk_continue() dereferences w->node->parent (NULL) and panics: BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:fib6_walk_continue+0x6e/0x170 Call Trace: <TASK> fib6_dump_table.isra.0+0xc5/0x240 inet6_dump_fib+0xf6/0x420 rtnl_dumpit+0x30/0xa0 netlink_dump+0x15b/0x460 netlink_recvmsg+0x1d6/0x2a0 ____sys_recvmsg+0x17a/0x190 Fix by storing tb->tb6_id in cb->args[1] instead of a positional index. On resume, skip entries until the id matches; a concurrent head-insert can never match the saved id, so the walker always resumes on the correct table. Signed-off-by: Pengfei Zhang <redacted>
Patch looks good, but you forgot to add a Fixes: tag
Perhaps:
Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing
tables to 2^32")
quoted hunk ↗ jump to hunk
--- The same crash was independently reported in a production environment (kernel 5.15.137, triggered by ovs-vswitchd issuing RTM_GETROUTE): https://lkml.iu.edu/hypermail/linux/kernel/2402.3/02068.html The crash is probabilistic and occurs in fib6_walk_continue() at the FWS_U state: case FWS_U: if (fn == w->root) return 0; pn = rcu_dereference_protected(fn->parent, 1); left = rcu_dereference_protected(pn->left, 1); /* crash here */ The crash dump shows fn->parent is NULL. At first glance this looks like fn is a leaf node whose parent was freed, but closer inspection of the walker state reveals fn->fn_flags has RTN_ROOT set — fn is itself a root node of a routing table, not a child node. A root node has no parent by definition, so fn->parent == NULL is correct for that node. The real question is why fn != w->root despite fn being a root. The answer is that w->root and fn belong to *different* tables: w->node (which became fn during traversal) still references a node from the table that was being dumped when the batch suspended, while w->root was silently redirected to a different table on resume. This misdirection happens because inet6_dump_fib() uses a positional index to resume across batches. Consider a hash slot containing two tables [A(pos=0), B(pos=1)] where B is large enough to require multiple batches. On the first batch, B suspends mid-walk and the loop saves: cb->args[1] = e; /* e=1, position of B in the chain */ The RTNL lock is then released. At this point a concurrent fib6_new_table() inserts table C at the chain head via hlist_add_head_rcu(), making the chain [C(pos=0), A(pos=1), B(pos=2)]. On the next batch, inet6_dump_fib() resumes with s_e=1 and iterates: s_e = cb->args[1]; /* s_e = 1 */ hlist_for_each_entry_rcu(tb, head, tb6_hlist) { if (e < s_e) /* skip C at pos=0 */ goto next; /* e=1: tb now points to A, not B */ fib6_dump_table(tb, skb, cb); /* called with wrong table A */ } Inside fib6_dump_table(), w->root is unconditionally overwritten before the resume branch is entered: w->root = &table->tb6_root; /* now A's root */ /* ... */ } else { int sernum = READ_ONCE(w->root->fn_sernum); /* A's sernum */ if (cb->args[5] != sernum) { /* sernum changed: safe reset, w->node = w->root (A) */ w->node = w->root; } else { /* sernum unchanged: w->node untouched, still in B */ w->skip = 0; } fib6_walk_continue(w); /* sernum equal: w->root=A, w->node=B */ } The sernum guard was intended to detect tree modifications and reset the walk, but here the two tables happen to share the same fn_sernum value (a global flush had previously unified them), so the guard does not fire and w->node is left pointing into B's tree. From this point w->root and w->node belong to different tables. When fib6_walk_continue() traverses upward and reaches B's root node (fn->fn_flags & RTN_ROOT), the exit check: if (fn == w->root) /* B's root != A's root, check fails */ return 0; pn = fn->parent; /* B's root has no parent: pn == NULL */ left = pn->left; /* NULL deref -> crash */ net/ipv6/ip6_fib.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index fc95738de..bda492634 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) }; const struct nlmsghdr *nlh = cb->nlh; struct net *net = sock_net(skb->sk); - unsigned int e = 0, s_e; struct hlist_head *head; struct fib6_walker *w; struct fib6_table *tb; unsigned int h, s_h; + u32 s_id; int err = 0; rcu_read_lock();@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) } s_h = cb->args[0]; - s_e = cb->args[1]; + s_id = cb->args[1]; - for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) { - e = 0; + for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) { head = &net->ipv6.fib_table_hash[h]; hlist_for_each_entry_rcu(tb, head, tb6_hlist) { - if (e < s_e) - goto next; + if (s_id && tb->tb6_id != s_id) + continue; + s_id = 0; + + cb->args[1] = tb->tb6_id; err = fib6_dump_table(tb, skb, cb); if (err != 0) goto out; -next: - e++; } } out: - cb->args[1] = e; cb->args[0] = h; unlock: --2.34.1