Thread (10 messages) 10 messages, 5 authors, 2025-08-15
STALE304d

[PATCH net v2] TUN/TAP: Improving throughput and latency by avoiding SKB drops

From: Simon Schippers <hidden>
Date: 2025-08-11 22:04:44
Also in: lkml
Subsystem: networking drivers, the rest, tun/tap driver · Maintainers: Andrew Lunn, "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Linus Torvalds, Willem de Bruijn, Jason Wang

This patch is the result of our paper with the title "The NODROP Patch:
Hardening Secure Networking for Real-time Teleoperation by Preventing
Packet Drops in the Linux TUN Driver" [1].
It deals with the tun_net_xmit function which drops SKB's with the reason
SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
resulting in reduced TCP performance and packet loss for bursty video
streams when used over VPN's.

The abstract reads as follows:
"Throughput-critical teleoperation requires robust and low-latency
communication to ensure safety and performance. Often, these kinds of
applications are implemented in Linux-based operating systems and transmit
over virtual private networks, which ensure encryption and ease of use by
providing a dedicated tunneling interface (TUN) to user space
applications. In this work, we identified a specific behavior in the Linux
TUN driver, which results in significant performance degradation due to
the sender stack silently dropping packets. This design issue drastically
impacts real-time video streaming, inducing up to 29 % packet loss with
noticeable video artifacts when the internal queue of the TUN driver is
reduced to 25 packets to minimize latency. Furthermore, a small queue
length also drastically reduces the throughput of TCP traffic due to many
retransmissions. Instead, with our open-source NODROP Patch, we propose
generating backpressure in case of burst traffic or network congestion.
The patch effectively addresses the packet-dropping behavior, hardening
real-time video streaming and improving TCP throughput by 36 % in high
latency scenarios."

In addition to the mentioned performance and latency improvements for VPN
applications, this patch also allows the proper usage of qdisc's. For
example a fq_codel can not control the queuing delay when packets are
already dropped in the TUN driver. This issue is also described in [2].

The performance evaluation of the paper (see Fig. 4) showed a 4%
performance hit for a single queue TUN with the default TUN queue size of
500 packets. However it is important to notice that with the proposed
patch no packet drop ever occurred even with a TUN queue size of 1 packet.
The utilized validation pipeline is available under [3].

As the reduction of the TUN queue to a size of down to 5 packets showed no
further performance hit in the paper, a reduction of the default TUN queue
size might be desirable accompanying this patch. A reduction would
obviously reduce buffer bloat and memory requirements.

Implementation details:
- The netdev queue start/stop flow control is utilized.
- Compatible with multi-queue by only stopping/waking the specific
netdevice subqueue.
- No additional locking is used.

In the tun_net_xmit function:
- Stopping the subqueue is done when the tx_ring gets full after inserting
the SKB into the tx_ring.
- In the unlikely case when the insertion with ptr_ring_produce fails, the
old dropping behavior is used for this SKB.

In the tun_ring_recv function:
- Waking the subqueue is done after consuming a SKB from the tx_ring when
the tx_ring is empty. Waking the subqueue when the tx_ring has any
available space, so when it is not full, showed crashes in our testing. We
are open to suggestions.
- When the tx_ring is configured to be small (for example to hold 1 SKB),
queuing might be stopped in the tun_net_xmit function while at the same
time, ptr_ring_consume is not able to grab a SKB. This prevents
tun_net_xmit from being called again and causes tun_ring_recv to wait
indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
queue is woken in the wait queue if it has stopped.
- Because the tun_struct is required to get the tx_queue into the new txq
pointer, the tun_struct is passed in tun_do_read aswell. This is likely
faster then trying to get it via the tun_file tfile because it utilizes a
rcu lock.

We are open to suggestions regarding the implementation :)
Thank you for your work!

[1] Link:
https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
[2] Link:
https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
[3] Link: https://github.com/tudo-cni/nodrop

Co-developed-by: Tim Gebauer <redacted>
Signed-off-by: Tim Gebauer <redacted>
Signed-off-by: Simon Schippers <redacted>
---
V1 -> V2: Removed NETDEV_TX_BUSY return case in tun_net_xmit and removed 
unnecessary netif_tx_wake_queue in tun_ring_recv.

 drivers/net/tun.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index cc6c50180663..81abdd3f9aca 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1060,13 +1060,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	nf_reset_ct(skb);
 
-	if (ptr_ring_produce(&tfile->tx_ring, skb)) {
+	queue = netdev_get_tx_queue(dev, txq);
+	if (unlikely(ptr_ring_produce(&tfile->tx_ring, skb))) {
+		netif_tx_stop_queue(queue);
 		drop_reason = SKB_DROP_REASON_FULL_RING;
 		goto drop;
 	}
+	if (ptr_ring_full(&tfile->tx_ring))
+		netif_tx_stop_queue(queue);
 
 	/* dev->lltx requires to do our own update of trans_start */
-	queue = netdev_get_tx_queue(dev, txq);
 	txq_trans_cond_update(queue);
 
 	/* Notify and wake up reader process */
@@ -2110,9 +2113,10 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	return total;
 }
 
-static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
+static void *tun_ring_recv(struct tun_struct *tun, struct tun_file *tfile, int noblock, int *err)
 {
 	DECLARE_WAITQUEUE(wait, current);
+	struct netdev_queue *txq;
 	void *ptr = NULL;
 	int error = 0;
 
@@ -2124,6 +2128,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 		goto out;
 	}
 
+	txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
 	add_wait_queue(&tfile->socket.wq.wait, &wait);
 
 	while (1) {
@@ -2131,6 +2136,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 		ptr = ptr_ring_consume(&tfile->tx_ring);
 		if (ptr)
 			break;
+
+		if (unlikely(netif_tx_queue_stopped(txq)))
+			netif_tx_wake_queue(txq);
+
 		if (signal_pending(current)) {
 			error = -ERESTARTSYS;
 			break;
@@ -2147,6 +2156,10 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 	remove_wait_queue(&tfile->socket.wq.wait, &wait);
 
 out:
+	if (ptr_ring_empty(&tfile->tx_ring)) {
+		txq = netdev_get_tx_queue(tun->dev, tfile->queue_index);
+		netif_tx_wake_queue(txq);
+	}
 	*err = error;
 	return ptr;
 }
@@ -2165,7 +2178,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 
 	if (!ptr) {
 		/* Read frames from ring */
-		ptr = tun_ring_recv(tfile, noblock, &err);
+		ptr = tun_ring_recv(tun, tfile, noblock, &err);
 		if (!ptr)
 			return err;
 	}
-- 
2.43.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help