[PATCH net-next v6 0/8] tun/tap & vhost-net: netdev queue flow control to... | netdev

[PATCH net-next v6 0/8] tun/tap & vhost-net: netdev queue flow control to avoid ptr_ring tail drop

From: Simon Schippers <hidden>
Date: 2025-11-21 09:23:12
Also in: kvm, lkml, virtualization

On 11/21/25 07:19, Jason Wang wrote:

On Thu, Nov 20, 2025 at 11:30 PM Simon Schippers
[off-list ref] wrote:

quoted

This patch series deals with tun/tap and vhost-net which drop incoming
SKBs whenever their internal ptr_ring buffer is full. Instead, with this
patch series, the associated netdev queue is stopped before this happens.
This allows the connected qdisc to function correctly as reported by [1]
and improves application-layer performance, see our paper [2]. Meanwhile
the theoretical performance differs only slightly:

+--------------------------------+-----------+----------+
| pktgen benchmarks to Debian VM | Stock     | Patched  |
| i5 6300HQ, 20M packets         |           |          |
+-----------------+--------------+-----------+----------+
| TAP             | Transmitted  | 195 Kpps  | 183 Kpps |
|                 +--------------+-----------+----------+
|                 | Lost         | 1615 Kpps | 0 pps    |
+-----------------+--------------+-----------+----------+
| TAP+vhost_net   | Transmitted  | 589 Kpps  | 588 Kpps |
|                 +--------------+-----------+----------+
|                 | Lost         | 1164 Kpps | 0 pps    |
+-----------------+--------------+-----------+----------+

Hi Jason,

thank you for your reply!

PPS drops somehow for TAP, any reason for that?

I have no explicit explanation for that except general overheads coming
with this implementation.

Btw, I had some questions:

1) most of the patches in this series would introduce non-trivial
impact on the performance, we probably need to benchmark each or split
the series. What's more we need to run TCP benchmark
(throughput/latency) as well as pktgen see the real impact

What could be done, IMO, is to activate tun_ring_consume() /
tap_ring_consume() before enabling tun_ring_produce(). Then we could see
if this alone drops performance.

For TCP benchmarks, you mean userspace performance like iperf3 between a
host and a guest system?

2) I see this:

        if (unlikely(tun_ring_produce(&tfile->tx_ring, queue, skb))) {
                drop_reason = SKB_DROP_REASON_FULL_RING;
                goto drop;
        }

So there could still be packet drop? Or is this related to the XDP path?

Yes, there can be packet drops after a ptr_ring resize or a ptr_ring
unconsume. Since those two happen so rarely, I figured we should just
drop in this case.

3) The LLTX change would have performance implications, but the
benmark doesn't cover the case where multiple transmission is done in
parallel

Do you mean multiple applications that produce traffic and potentially
run on different CPUs?

4) After the LLTX change, it seems we've lost the synchronization with
the XDP_TX and XDP_REDIRECT path?

I must admit I did not take a look at XDP and cannot really judge if/how
lltx has an impact on XDP. But from my point of view, __netif_tx_lock()
instead of __netif_tx_acquire(), is executed before the tun_net_xmit()
call and I do not see the impact for XDP, which calls its own methods.

5) The series introduces various ptr_ring helpers with lots of
ordering stuff which is complicated, I wonder if we first have a
simple patch to implement the zero packet loss

I personally don't see how a simpler patch is possible without using
discouraged practices like returning NETDEV_TX_BUSY in tun_net_xmit or
spin locking between producer and consumer. But I am open for
suggestions :)

quoted

This patch series includes tun/tap, and vhost-net because they share
logic. Adjusting only one of them would break the others. Therefore, the
patch series is structured as follows:
1+2: new ptr_ring helpers for 3
3: tun/tap: tun/tap: add synchronized ring produce/consume with queue
management
4+5+6: tun/tap: ptr_ring wrappers and other helpers to be called by
vhost-net
7: tun/tap & vhost-net: only now use the previous implemented functions to
not break git bisect
8: tun/tap: drop get ring exports (not used anymore)

Possible future work:
- Introduction of Byte Queue Limits as suggested by Stephen Hemminger

This seems to be not easy. The tx completion depends on the userspace behaviour.

I agree, but I really would like to reduce the buffer bloat caused by the
default 500 TUN / 1000 TAP packet queue without losing performance.

quoted

- Adaption of the netdev queue flow control for ipvtap & macvtap

[1] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
[2] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf

Thanks

Thanks! :)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help