Re: [PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops
From: Jason Wang <jasowang@redhat.com>
Date: 2025-08-12 03:10:24
Also in:
lkml
On Tue, Aug 12, 2025 at 6:04 AM Simon Schippers [off-list ref] wrote:
This patch is the result of our paper with the title "The NODROP Patch: Hardening Secure Networking for Real-time Teleoperation by Preventing Packet Drops in the Linux TUN Driver" [1]. It deals with the tun_net_xmit function which drops SKB's with the reason SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full, resulting in reduced TCP performance and packet loss for bursty video streams when used over VPN's. The abstract reads as follows: "Throughput-critical teleoperation requires robust and low-latency communication to ensure safety and performance. Often, these kinds of applications are implemented in Linux-based operating systems and transmit over virtual private networks, which ensure encryption and ease of use by providing a dedicated tunneling interface (TUN) to user space applications. In this work, we identified a specific behavior in the Linux TUN driver, which results in significant performance degradation due to the sender stack silently dropping packets. This design issue drastically impacts real-time video streaming, inducing up to 29 % packet loss with noticeable video artifacts when the internal queue of the TUN driver is reduced to 25 packets to minimize latency. Furthermore, a small queue length also drastically reduces the throughput of TCP traffic due to many retransmissions. Instead, with our open-source NODROP Patch, we propose generating backpressure in case of burst traffic or network congestion. The patch effectively addresses the packet-dropping behavior, hardening real-time video streaming and improving TCP throughput by 36 % in high latency scenarios." In addition to the mentioned performance and latency improvements for VPN applications, this patch also allows the proper usage of qdisc's. For example a fq_codel can not control the queuing delay when packets are already dropped in the TUN driver. This issue is also described in [2]. The performance evaluation of the paper (see Fig. 4) showed a 4% performance hit for a single queue TUN with the default TUN queue size of 500 packets. However it is important to notice that with the proposed patch no packet drop ever occurred even with a TUN queue size of 1 packet. The utilized validation pipeline is available under [3]. As the reduction of the TUN queue to a size of down to 5 packets showed no further performance hit in the paper, a reduction of the default TUN queue size might be desirable accompanying this patch. A reduction would obviously reduce buffer bloat and memory requirements. Implementation details: - The netdev queue start/stop flow control is utilized. - Compatible with multi-queue by only stopping/waking the specific netdevice subqueue. - No additional locking is used. In the tun_net_xmit function: - Stopping the subqueue is done when the tx_ring gets full after inserting the SKB into the tx_ring. - In the unlikely case when the insertion with ptr_ring_produce fails, the old dropping behavior is used for this SKB. In the tun_ring_recv function: - Waking the subqueue is done after consuming a SKB from the tx_ring when the tx_ring is empty. Waking the subqueue when the tx_ring has any available space, so when it is not full, showed crashes in our testing. We are open to suggestions. - When the tx_ring is configured to be small (for example to hold 1 SKB), queuing might be stopped in the tun_net_xmit function while at the same time, ptr_ring_consume is not able to grab a SKB. This prevents tun_net_xmit from being called again and causes tun_ring_recv to wait indefinitely for a SKB in the blocking wait queue. Therefore, the netdev queue is woken in the wait queue if it has stopped. - Because the tun_struct is required to get the tx_queue into the new txq pointer, the tun_struct is passed in tun_do_read aswell. This is likely faster then trying to get it via the tun_file tfile because it utilizes a rcu lock. We are open to suggestions regarding the implementation :) Thank you for your work!
I would like to see some benchmark results. Not only VPN but also a classical VM setup that is using vhost-net + TAP.
quoted hunk ↗ jump to hunk
[1] Link: https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf [2] Link: https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device [3] Link: https://github.com/tudo-cni/nodrop Co-developed-by: Tim Gebauer <redacted> Signed-off-by: Tim Gebauer <redacted> Signed-off-by: Simon Schippers <redacted> --- V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed unnecessary netif_tx_wake_queue in tun_ring_recv. drivers/net/tun.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)diff --git a/drivers/net/tun.c b/drivers/net/tun.c index cc6c50180663..81abdd3f9aca 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c@@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev) nf_reset_ct(skb); - if (ptr_ring_produce(&tfile->tx_ring, skb)) { + queue = netdev_get_tx_queue(dev, txq); + if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) { + netif_tx_stop_queue(queue); drop_reason = SKB_DROP_REASON_FULL_RING;
This would still drop the packet. Should we detect if the ring is about to be full and stop then like a virtio-net? Thanks