Thread (10 messages) 10 messages, 4 authors, 2026-01-31

Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices

From: Eric Dumazet <edumazet@google.com>
Date: 2026-01-30 20:56:20
Also in: lkml, stable
Subsystem: networking [general], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds

On Thu, Jan 8, 2026 at 9:53 PM Ankit Garg [off-list ref] wrote:
On Thu, Jan 8, 2026 at 8:37 AM Eric Dumazet [off-list ref] wrote:
quoted
On Thu, Jan 8, 2026 at 4:36 PM Ankit Garg [off-list ref] wrote:
quoted
On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski [off-list ref] wrote:
quoted
On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:
quoted
This series fixes a kernel panic in the GVE driver caused by
out-of-bounds array access when the network stack provides an invalid
TX queue index.
Do you know how? I seem to recall we had such issues due to bugs
in the qdisc layer, most of which were fixed.

Fixing this at the source, if possible, would be far preferable
to sprinkling this condition to all the drivers.
That matches our observation—we have encountered this panic on older
kernels (specifically Rocky Linux 8) but have not been able to
reproduce it on recent upstream kernels.
What is the kernel version used in Rocky Linux 8 ?
The kernel version where we observed this is 4.18.0 (full version
4.18.0-553.81.1+2.1.el8_10_ciq)
quoted
Note that the test against real_num_tx_queues is done before reaching
the Qdisc layer.

It might help to give a stack trace of a panic.
Crash happens in the sch_direct_xmit path per the trace.

I wonder if sch_direct_xmit is acting as an optimization to bypass the
queueing layer, and if that is somehow bypassing the queue index
checks you mentioned?

I'll try to dig a bit deeper into that specific flow, but here is the
trace in the meantime:
Jakub, the issue is that before 4.20, calling synchronize_rcu()
instead of synchronize_rcu_bh()
was probably a bug. I suspect we had more issues like that.

 __dev_queue_xmit takes a rcu_read_lock_bh(), while the code (that you
added in 2018 [1])
to update the queue netif_set_real_num_tx_queues does synchronize_net()
(aka synchronize_rcu()) and in earlier times, it would mean that this
would maybe return too soon (say on preemptible kernels)

[1] commit ac5b70198adc25c73fba28de4f78adcee8f6be0b
Author: Jakub Kicinski [off-list ref]
Date:   Mon Feb 12 21:35:31 2018 -0800

    net: fix race on decreasing number of TX queues

    netif_set_real_num_tx_queues() can be called when netdev is up.
    That usually happens when user requests change of number of
    channels/rings with ethtool -L.  The procedure for changing
    the number of queues involves resetting the qdiscs and setting
    dev->num_tx_queues to the new value.  When the new value is
    lower than the old one, extra care has to be taken to ensure
    ordering of accesses to the number of queues vs qdisc reset.

    Currently the queues are reset before new dev->num_tx_queues
    is assigned, leaving a window of time where packets can be
    enqueued onto the queues going down, leading to a likely
    crash in the drivers, since most drivers don't check if TX
    skbs are assigned to an active queue.

    Fixes: e6484930d7c7 ("net: allocate tx queues in register_netdevice")
    Signed-off-by: Jakub Kicinski [off-list ref]
    Signed-off-by: David S. Miller [off-list ref]

So perhaps a fix for pre 4.20 kernel would be: (I kept the
synchronize_net() to be really cautious and because I really do not
want to test)
diff --git a/net/core/dev.c b/net/core/dev.c
index 93243479085fb1d61031ed2136f5aee22d8f313d..4dd1db70561d35fe2097afc86764dd82bfd0bf27
100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2636,6 +2636,7 @@ int netif_set_real_num_tx_queues(struct
net_device *dev, unsigned int txq)

                if (disabling) {
                        synchronize_net();
+                       synchronize_rcu_bh();
                        qdisc_reset_all_tx_gt(dev, txq);
 #ifdef CONFIG_XPS
                        netif_reset_xps_queues_gt(dev, txq);
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help