Hello netdev maintainers, I would like to report a slab-out-of-bounds write reachable through mini_qdisc_pair_swap() during clsact/ingress qdisc teardown, and to propose a fix grounded in an analysis of the actual object lifetimes involved. A first cut at this report suggested a check of the form "miniq_old == &miniqp->miniq1 || miniq_old == &miniqp->miniq2" inside mini_qdisc_pair_swap(); on closer reading of the code that check is incorrect, and the fix belongs in the teardown path instead. Details below. Bug --- KASAN: slab-out-of-bounds Write in mini_qdisc_pair_swap File: net/sched/sch_generic.c Function: mini_qdisc_pair_swap() Affected versions ----------------- - linux-6.1.133 - linux-6.12.91 - mainline (as of this report) Faulting write -------------- Inside mini_qdisc_pair_swap(), at the tail: if (miniq_old) miniq_old->rcu_state = start_poll_synchronize_rcu(); miniq_old is taken from rcu_dereference_protected(*miniqp->p_miniq, 1) and, in a correct...
From: boz baba <hidden>
Date: 2026-06-02 19:33:27
Hello netdev maintainers,
I would like to report a slab-out-of-bounds write reachable through
mini_qdisc_pair_swap() during clsact/ingress qdisc teardown, and to
propose a fix grounded in an analysis of the actual object lifetimes
involved. A first cut at this report suggested a check of the form
"miniq_old == &miniqp->miniq1 || miniq_old == &miniqp->miniq2" inside
mini_qdisc_pair_swap(); on closer reading of the code that check is
incorrect, and the fix belongs in the teardown path instead. Details
below.
Bug
---
KASAN: slab-out-of-bounds Write in mini_qdisc_pair_swap
File: net/sched/sch_generic.c
Function: mini_qdisc_pair_swap()
Affected versions
-----------------
- linux-6.1.133
- linux-6.12.91
- mainline (as of this report)
Faulting write
--------------
Inside mini_qdisc_pair_swap(), at the tail:
if (miniq_old)
miniq_old->rcu_state = start_poll_synchronize_rcu();
miniq_old is taken from rcu_dereference_protected(*miniqp->p_miniq, 1)
and, in a correct sequence, points at &miniqp->miniq1 or &miniqp->miniq2.
Object lifetimes (as the code stands today)
-------------------------------------------
- struct mini_Qdisc_pair is embedded by value inside the qdisc's
private area. For clsact, clsact_sched_data carries miniqp_ingress
and miniqp_egress; for ingress, ingress_sched_data carries miniqp.
See net/sched/sch_ingress.c (clsact_sched_data, ingress_sched_data).
- &miniqp->miniq1 and &miniqp->miniq2 are therefore interior pointers
into qdisc_priv(sch). They share fate with the Qdisc allocation.
- miniqp->p_miniq does NOT point into clsact_sched_data. It points
into the device's per-direction tcx_entry->miniq slot, set up in
*_init via mini_qdisc_pair_init(&q->miniqp_*, sch,
&tcx_entry(entry)->miniq). That slot is what publishes the active
miniq to the fast path.
- The only caller of mini_qdisc_pair_swap() is the chain_head_change
callback (clsact_chain_head_change), wired into the tcf_block at
init time with priv = &q->miniqp_*.
- Teardown calls tcf_block_put_ext() in clsact_destroy()/
ingress_destroy(), which is intended to detach the callback under
RTNL before the qdisc private area is released.
Why the naive fix is wrong
--------------------------
An earlier draft of this report proposed:
if (miniq_old &&
(miniq_old == &miniqp->miniq1 || miniq_old == &miniqp->miniq2))
miniq_old->rcu_state = start_poll_synchronize_rcu();
This is not a real safety check:
1. miniq_old is always assigned from &this_miniqp->miniqN upstream in
the same function (the "miniq = miniq_old != &miniqp->miniq1 ?
&miniqp->miniq1 : &miniqp->miniq2" line), so the equality is true
by construction on the non-buggy path. The check does nothing in
the common case.
2. If clsact_sched_data has been freed, miniqp and miniq_old were
freed together -- they live in the same allocation. The address
comparison can still spuriously hold in stale memory, and the
subsequent write to miniq_old->rcu_state is exactly the UAF /
OOB write the check claims to prevent.
3. Even taking the equality at face value, the write still lands in
miniqp's storage. Guarding the write with an equality test on
miniqp itself proves nothing about miniqp's liveness; the two
pointers share fate.
In short, the proposed check validates miniq_old against an oracle
(miniqp) that has the same lifetime as miniq_old, so it cannot
distinguish a live target from a dead one.
Where the fix belongs
---------------------
The invariant that needs to hold is:
No chain_head_change callback may run after the owning qdisc's
private area has been released.
That is a caller-side / teardown-ordering invariant. The right place
to enforce it is in ingress_destroy() / clsact_destroy(), not inside
mini_qdisc_pair_swap().
Proposed fix
------------
(a) Drain in-flight callbacks before releasing qdisc_priv() storage.
After tcf_block_put_ext() returns, a chain_head_change dispatch
that was already in flight on another CPU may still be executing
against &q->miniqp_*. Add a synchronize_rcu() between
tcf_block_put_ext() and the rest of destroy:
--- a/net/sched/sch_ingress.c
+++ b/net/sched/sch_ingress.c
@@ ingress_destroy(struct Qdisc *sch)
tcf_block_put_ext(q->block, sch, &q->block_info);
+
+ /* Ensure any chain_head_change callback already dispatched
+ * before tcf_block_put_ext() returned has finished writing
+ * into q->miniqp before qdisc_priv() storage is released.
+ */
+ synchronize_rcu();
if (mini_qdisc_pair_inited(&q->miniqp)) {
Apply the same drain in clsact_destroy() after both
tcf_block_put_ext() calls.
(b) Skip the trailing rcu_state hand-off on the teardown path.
On the tp_head == NULL branch the function already publishes
NULL into *miniqp->p_miniq, so no future reader can observe
miniq_old via that slot. The trailing
"miniq_old->rcu_state = start_poll_synchronize_rcu()" exists to
gate a *future* reuse of the same miniq slot; once the slot is
NULLed and no further swaps will occur, the hand-off has no
consumer. Skipping it removes the only write that can race
teardown:
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ mini_qdisc_pair_swap()
if (!tp_head) {
RCU_INIT_POINTER(*miniqp->p_miniq, NULL);
+ /* Teardown: *p_miniq is now NULL, no future swap
+ * will read miniq_old->rcu_state, so the hand-off
+ * write is unnecessary and would only widen the
+ * window for a use-after-free against miniq_old.
+ */
+ return;
} else {
miniq = miniq_old != &miniqp->miniq1 ?
&miniqp->miniq1 : &miniqp->miniq2;
...
rcu_assign_pointer(*miniqp->p_miniq, miniq);
}
if (miniq_old)
miniq_old->rcu_state = start_poll_synchronize_rcu();
(a) is the structural fix; (b) is a narrower belt-and-suspenders
that also stands on its own and is, I believe, semantically a no-op
on the non-buggy path.
Alternative structural fix
--------------------------
If for some reason chain_head_change callbacks need to be allowed
to outlive tcf_block_put_ext(), the cleaner long-term fix is to
decouple mini_Qdisc_pair lifetime from qdisc_priv: allocate it with
kzalloc() in *_init, store a pointer in sched_data, and release it
via kfree_rcu() in destroy. The existing
start_poll_synchronize_rcu() hand-off then becomes safe because the
miniqp storage outlives one grace period by construction.
I'm happy to test patches or provide more information.
Thanks,
boz