Re: [PATCH net-next v4] tcp: extend tcp_retransmit_skb tracepoint with failure reasons
From: Jakub Kicinski <kuba@kernel.org>
Date: 2025-07-14 23:46:26
Also in:
linux-trace-kernel, lkml
From: Jakub Kicinski <kuba@kernel.org>
Date: 2025-07-14 23:46:26
Also in:
linux-trace-kernel, lkml
On Thu, 10 Jul 2025 10:01:38 +0800 (CST) fan.yu9@zte.com.cn wrote:
Background
==========
When TCP retransmits a packet due to missing ACKs, the
retransmission may fail for various reasons (e.g., packets
stuck in driver queues, sequence errors, or routing issues).
The original tcp_retransmit_skb tracepoint:
'commit e086101b150a ("tcp: add a tracepoint for tcp retransmission")'
lacks visibility into these failure causes, making production
diagnostics difficult.
Solution
========
Adds a "result" field to the tcp_retransmit_skb tracepoint,
enumerating with explicit failure cases:
TCP_RETRANS_ERR_DEFAULT (retransmit terminate unexpectedly)
TCP_RETRANS_IN_HOST_QUEUE (packet still queued in driver)
TCP_RETRANS_END_SEQ_ERROR (invalid end sequence)
TCP_RETRANS_NOMEM (retransmit no memory)
TCP_RETRANS_ROUTE_FAIL (routing failure)
TCP_RETRANS_RCV_ZERO_WINDOW (closed receiver window)
Have you tried to use this or perform some analysis of which of these
reasons actually make sense to add? I'd venture a guess that
IN_HOST_QUEUE will dominate in datacenter. Maybe RCV_ZERO_WINDOW
can happen. Tracing ENOMEM is a waste of time, so is this:
if (unlikely(before(TCP_SKB_CB(skb)->end_seq, tp->snd_una))) {
>>>>> WARN_ON_ONCE(1); <<<<<<<<
- return -EINVAL;
+ result = TCP_RETRANS_END_SEQ_ERROR;
--
pw-bot: cr