Re: [PATCH net 0/2] gve: fix crashes on invalid TX queue indices

From: Ankit Garg <hidden>
Date: 2026-01-08 20:53:26
Also in: lkml, stable

On Thu, Jan 8, 2026 at 8:37 AM Eric Dumazet [off-list ref] wrote:

On Thu, Jan 8, 2026 at 4:36 PM Ankit Garg [off-list ref] wrote:

quoted

On Tue, Jan 6, 2026 at 6:22 PM Jakub Kicinski [off-list ref] wrote:

quoted

On Mon,  5 Jan 2026 15:25:02 -0800 Joshua Washington wrote:

quoted

This series fixes a kernel panic in the GVE driver caused by
out-of-bounds array access when the network stack provides an invalid
TX queue index.

Do you know how? I seem to recall we had such issues due to bugs
in the qdisc layer, most of which were fixed.

Fixing this at the source, if possible, would be far preferable
to sprinkling this condition to all the drivers.

That matches our observation—we have encountered this panic on older
kernels (specifically Rocky Linux 8) but have not been able to
reproduce it on recent upstream kernels.

What is the kernel version used in Rocky Linux 8 ?

The kernel version where we observed this is 4.18.0 (full version
4.18.0-553.81.1+2.1.el8_10_ciq)

Note that the test against real_num_tx_queues is done before reaching
the Qdisc layer.

It might help to give a stack trace of a panic.

Crash happens in the sch_direct_xmit path per the trace.

I wonder if sch_direct_xmit is acting as an optimization to bypass the
queueing layer, and if that is somehow bypassing the queue index
checks you mentioned?

I'll try to dig a bit deeper into that specific flow, but here is the
trace in the meantime:

Call Trace:
? __warn+0x94/0xe0
? gve_tx+0xa9f/0xc30 [gve]
? gve_tx+0xa9f/0xc30 [gve]
? report_bug+0xb1/0xe0
? do_error_trap+0x9e/0xd0
? do_invalid_op+0x36/0x40
? gve_tx+0xa9f/0xc30 [gve]
? invalid_op+0x14/0x20
? gve_tx+0xa9f/0xc30 [gve]
? netif_skb_features+0xcf/0x2a0
dev_hard_start_xmit+0xd7/0x240
sch_direct_xmit+0x9f/0x370
__dev_queue_xmit+0xa04/0xc50
ip_finish_output2+0x26d/0x430
? __ip_finish_output+0xdf/0x1d0
ip_output+0x70/0xf0
__ip_queue_xmit+0x165/0x400
__tcp_transmit_skb+0xa6b/0xb90
tcp_connect+0xae3/0xd40
tcp_v4_connect+0x476/0x4f0
__inet_stream_connect+0xda/0x380

quoted

Could you point us to the specific qdisc fixes you recall? We'd like
to verify if the issue we are seeing on the older kernel is indeed one
of those known/fixed bugs.

If it turns out this is fully resolved in the core network stack
upstream, we can drop this patch for the mainline driver. However, if
there is ambiguity, do you think there is value in keeping this check
to prevent the driver from crashing on invalid input?

We already have many costly checks, and netdev_core_pick_tx() should
already prevent such panic.

quoted

Thanks,
Ankit Garg

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help