Thread (4 messages) 4 messages, 2 authors, 5d ago

Re: [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump

From: Eric Dumazet <edumazet@google.com>
Date: 2026-06-24 17:22:15
Also in: lkml

On Wed, Jun 24, 2026 at 10:12 AM Pengfei Zhang [off-list ref] wrote:
From: Pengfei Zhang <redacted>

inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain.  Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries.  The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one.  fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:fib6_walk_continue+0x6e/0x170
  Call Trace:
   <TASK>
   fib6_dump_table.isra.0+0xc5/0x240
   inet6_dump_fib+0xf6/0x420
   rtnl_dumpit+0x30/0xa0
   netlink_dump+0x15b/0x460
   netlink_recvmsg+0x1d6/0x2a0
   ____sys_recvmsg+0x17a/0x190

Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index.  On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.

Signed-off-by: Pengfei Zhang <redacted>
Patch looks good, but you forgot to add a Fixes: tag

Perhaps:

Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing
tables to 2^32")
quoted hunk ↗ jump to hunk
---
The same crash was independently reported in a production environment
(kernel 5.15.137, triggered by ovs-vswitchd issuing RTM_GETROUTE):
  https://lkml.iu.edu/hypermail/linux/kernel/2402.3/02068.html

The crash is probabilistic and occurs in fib6_walk_continue() at the
FWS_U state:

  case FWS_U:
      if (fn == w->root)
          return 0;
      pn = rcu_dereference_protected(fn->parent, 1);
      left = rcu_dereference_protected(pn->left, 1);  /* crash here */

The crash dump shows fn->parent is NULL.  At first glance this looks
like fn is a leaf node whose parent was freed, but closer inspection of
the walker state reveals fn->fn_flags has RTN_ROOT set — fn is itself
a root node of a routing table, not a child node.  A root node has no
parent by definition, so fn->parent == NULL is correct for that node.

The real question is why fn != w->root despite fn being a root.  The
answer is that w->root and fn belong to *different* tables: w->node
(which became fn during traversal) still references a node from the
table that was being dumped when the batch suspended, while w->root was
silently redirected to a different table on resume.

This misdirection happens because inet6_dump_fib() uses a positional
index to resume across batches.  Consider a hash slot containing two
tables [A(pos=0), B(pos=1)] where B is large enough to require multiple
batches.  On the first batch, B suspends mid-walk and the loop saves:

  cb->args[1] = e;   /* e=1, position of B in the chain */

The RTNL lock is then released.  At this point a concurrent
fib6_new_table() inserts table C at the chain head via
hlist_add_head_rcu(), making the chain [C(pos=0), A(pos=1), B(pos=2)].

On the next batch, inet6_dump_fib() resumes with s_e=1 and iterates:

  s_e = cb->args[1];   /* s_e = 1 */
  hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
      if (e < s_e)     /* skip C at pos=0 */
          goto next;
      /* e=1: tb now points to A, not B */
      fib6_dump_table(tb, skb, cb);   /* called with wrong table A */
  }

Inside fib6_dump_table(), w->root is unconditionally overwritten
before the resume branch is entered:

  w->root = &table->tb6_root;        /* now A's root              */
  /* ... */
  } else {
      int sernum = READ_ONCE(w->root->fn_sernum);  /* A's sernum  */
      if (cb->args[5] != sernum) {
          /* sernum changed: safe reset, w->node = w->root (A)    */
          w->node = w->root;
      } else {
          /* sernum unchanged: w->node untouched, still in B       */
          w->skip = 0;
      }
      fib6_walk_continue(w);   /* sernum equal: w->root=A, w->node=B */
  }

The sernum guard was intended to detect tree modifications and reset
the walk, but here the two tables happen to share the same fn_sernum
value (a global flush had previously unified them), so the guard does
not fire and w->node is left pointing into B's tree.

From this point w->root and w->node belong to different tables.  When
fib6_walk_continue() traverses upward and reaches B's root node
(fn->fn_flags & RTN_ROOT), the exit check:

  if (fn == w->root)   /* B's root != A's root, check fails */
      return 0;
  pn = fn->parent;     /* B's root has no parent: pn == NULL */
  left = pn->left;     /* NULL deref -> crash */

 net/ipv6/ip6_fib.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
        };
        const struct nlmsghdr *nlh = cb->nlh;
        struct net *net = sock_net(skb->sk);
-       unsigned int e = 0, s_e;
        struct hlist_head *head;
        struct fib6_walker *w;
        struct fib6_table *tb;
        unsigned int h, s_h;
+       u32 s_id;
        int err = 0;

        rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
        }

        s_h = cb->args[0];
-       s_e = cb->args[1];
+       s_id = cb->args[1];

-       for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
-               e = 0;
+       for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
                head = &net->ipv6.fib_table_hash[h];
                hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
-                       if (e < s_e)
-                               goto next;
+                       if (s_id && tb->tb6_id != s_id)
+                               continue;
+                       s_id = 0;
+
+                       cb->args[1] = tb->tb6_id;
                        err = fib6_dump_table(tb, skb, cb);
                        if (err != 0)
                                goto out;
-next:
-                       e++;
                }
        }
 out:
-       cb->args[1] = e;
        cb->args[0] = h;

 unlock:
--
2.34.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help