On Mon, Nov 10, 2025 at 8:44 AM Jakub Kicinski [off-list ref] wrote:
On Mon, 10 Nov 2025 09:44:55 +0000 Eric Dumazet wrote:
quoted
Avoid up to two cache line misses in qdisc dequeue() to fetch
skb_shinfo(skb)->gso_segs/gso_size while qdisc spinlock is held.
Idea is to cache gso_segs at enqueue time before spinlock is
acquired, in the first skb cache line, where we already
have qdisc_skb_cb(skb)->pkt_len.
This series gives a 8 % improvement in a TX intensive workload.
(120 Mpps -> 130 Mpps on a Turin host, IDPF with 32 TX queues)
According to CI this breaks a bunch of tests.
https://netdev.bots.linux.dev/contest.html?branch=net-next-2025-11-10--12-00
I think they all hit:
[ 20.682474][ T231] WARNING: CPU: 3 PID: 231 at ./include/net/sch_generic.h:843 __dev_xmit_skb+0x786/0x1550
Oh well, I will add this in V2, thank you !
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b76436ec3f4aa412bac1be3371f5c7c6245cc362..79501499dafba56271b9ebd97a8f379ffdc83cac
100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -841,7 +841,7 @@ static inline unsigned int qdisc_pkt_segs(const
struct sk_buff *skb)
u32 pkt_segs = qdisc_skb_cb(skb)->pkt_segs;
DEBUG_NET_WARN_ON_ONCE(pkt_segs !=
- skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1);
+ (skb_is_gso(skb) ? skb_shinfo(skb)->gso_segs : 1));
return pkt_segs;
}