Re: [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support

From: Toke Høiland-Jørgensen <hidden>
Date: 2026-03-28 20:06:54

Jonas Köppeler [off-list ref] writes:

On 3/27/26 13:49, Jesper Dangaard Brouer wrote:

quoted


On 27/03/2026 10.50, Toke Høiland-Jørgensen wrote:

quoted

hawk@kernel.org writes:

quoted

From: Jesper Dangaard Brouer <hawk@kernel.org>

This series adds BQL (Byte Queue Limits) to the veth driver, reducing
latency by dynamically limiting in-flight bytes in the ptr_ring and
moving buffering into the qdisc where AQM algorithms can act on it.

Problem:
   veth's 256-entry ptr_ring acts as a "dark buffer" -- packets queued
   there are invisible to the qdisc's AQM.  Under load, the ring fills
   completely (DRV_XOFF backpressure), adding up to 256 packets of
   unmanaged latency before the qdisc even sees congestion.

Solution:
   BQL (STACK_XOFF) dynamically limits in-flight bytes, stopping the
   queue before the ring fills.  This keeps the ring shallow and pushes
   excess packets into the qdisc, where sojourn-based AQM can measure
   and drop them.

So one question here: Is *Byte* queue limits really the right thing for
veth? As you mention above, the ptr_ring is sized in a number of
packets. On a physical NIC, accounting bytes makes sense because there's
a fixed line rate, so bytes turn directly into latency.

But on a veth device, the stack processing is per packet, and most
processing takes the same amount of time regardless of the size of the
packet (e.g., netfilter rules that operate on the skb only).

So my worry would be that when you're accounting in bytes, if there's a
mix of big and small packets, you'd end up with the BQL algorithm
scaling to a "too large" value, which would allow a lot of small packets
to be queued up, adding extra latency (or even overflowing the ring
buffer if the ratio is large enough).

Have you run any such experiments?

Thank for bring this up.
Yes, we have considered this (and agree).

Jonas is conduction some experiments.
I will let Jonas answer?

Hi,

I used the provided selftest, modified so that the payload size alternates
between 1400 bytes and sizeof(struct pkt_hdr) = 24 bytes every 5000 packets.

The receiver was slowed down using 10K iptables rules. I could confirm that
the receive queue filled up to ~66 packets, whereas the BQL limit is around
2884 bytes, corresponding to approximately 2 x 1400-byte packets.

I compared two accounting strategies: using skb->len vs. a fixed size of 1.

Ping results over 5 runs using skb->len accounting:

   rtt min/avg/max/mdev = 0.636/2.784/ 9.543/1.735 ms
   rtt min/avg/max/mdev = 0.629/2.947/10.587/1.927 ms
   rtt min/avg/max/mdev = 0.587/2.966/11.625/1.963 ms
   rtt min/avg/max/mdev = 0.589/3.006/10.694/1.979 ms

Ping results over 5 runs using fixed size (1) accounting:

   rtt min/avg/max/mdev = 0.587/2.446/6.261/1.065 ms
   rtt min/avg/max/mdev = 0.641/2.339/6.008/0.950 ms
   rtt min/avg/max/mdev = 0.688/2.527/5.506/1.086 ms
   rtt min/avg/max/mdev = 0.596/2.411/5.228/1.041 ms

The avg and max RTT are consistently lower with the fixed-size accounting.
This suggests that the excess buffered packets contribute to some
latency.

Right, so this sounds like fixed-size accounting is the way to go, then.
Cool :)

-Toke

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help