Re: [PATCH net-next 0/5] veth: add Byte Queue Limits (BQL) support
From: Toke Høiland-Jørgensen <hidden>
Date: 2026-03-28 20:06:54
Jonas Köppeler [off-list ref] writes:
On 3/27/26 13:49, Jesper Dangaard Brouer wrote:quoted
On 27/03/2026 10.50, Toke Høiland-Jørgensen wrote:quoted
hawk@kernel.org writes:quoted
From: Jesper Dangaard Brouer <hawk@kernel.org> This series adds BQL (Byte Queue Limits) to the veth driver, reducing latency by dynamically limiting in-flight bytes in the ptr_ring and moving buffering into the qdisc where AQM algorithms can act on it. Problem: veth's 256-entry ptr_ring acts as a "dark buffer" -- packets queued there are invisible to the qdisc's AQM. Under load, the ring fills completely (DRV_XOFF backpressure), adding up to 256 packets of unmanaged latency before the qdisc even sees congestion. Solution: BQL (STACK_XOFF) dynamically limits in-flight bytes, stopping the queue before the ring fills. This keeps the ring shallow and pushes excess packets into the qdisc, where sojourn-based AQM can measure and drop them.So one question here: Is *Byte* queue limits really the right thing for veth? As you mention above, the ptr_ring is sized in a number of packets. On a physical NIC, accounting bytes makes sense because there's a fixed line rate, so bytes turn directly into latency. But on a veth device, the stack processing is per packet, and most processing takes the same amount of time regardless of the size of the packet (e.g., netfilter rules that operate on the skb only). So my worry would be that when you're accounting in bytes, if there's a mix of big and small packets, you'd end up with the BQL algorithm scaling to a "too large" value, which would allow a lot of small packets to be queued up, adding extra latency (or even overflowing the ring buffer if the ratio is large enough). Have you run any such experiments?Thank for bring this up. Yes, we have considered this (and agree). Jonas is conduction some experiments. I will let Jonas answer?Hi, I used the provided selftest, modified so that the payload size alternates between 1400 bytes and sizeof(struct pkt_hdr) = 24 bytes every 5000 packets. The receiver was slowed down using 10K iptables rules. I could confirm that the receive queue filled up to ~66 packets, whereas the BQL limit is around 2884 bytes, corresponding to approximately 2 x 1400-byte packets. I compared two accounting strategies: using skb->len vs. a fixed size of 1. Ping results over 5 runs using skb->len accounting: rtt min/avg/max/mdev = 0.636/2.784/ 9.543/1.735 ms rtt min/avg/max/mdev = 0.629/2.947/10.587/1.927 ms rtt min/avg/max/mdev = 0.587/2.966/11.625/1.963 ms rtt min/avg/max/mdev = 0.589/3.006/10.694/1.979 ms Ping results over 5 runs using fixed size (1) accounting: rtt min/avg/max/mdev = 0.587/2.446/6.261/1.065 ms rtt min/avg/max/mdev = 0.641/2.339/6.008/0.950 ms rtt min/avg/max/mdev = 0.688/2.527/5.506/1.086 ms rtt min/avg/max/mdev = 0.596/2.411/5.228/1.041 ms The avg and max RTT are consistently lower with the fixed-size accounting. This suggests that the excess buffered packets contribute to some latency.
Right, so this sounds like fixed-size accounting is the way to go, then. Cool :) -Toke