Re: [PATCH net-next] tcp: fix spurious connection aborts due to TCP_USER_TIMEOUT and zero window
From: chengzhi <hidden>
Date: 2026-05-28 07:59:36
Subsystem:
networking [general], networking [tcp], the rest · Maintainers:
"David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Linus Torvalds
On 2026-5-27 21:40, Eric Dumazet wrote:
On Wed, May 27, 2026 at 1:16 AM Zhi Cheng [off-list ref] wrote:quoted
Under certain conditions, a stale icsk_probes_tstamp can lead to an unexpected connection abort during a zero-window state. The exact sequence leading to the timeout is as follows: 1. A zero window occurs. icsk_probes_tstamp is set. 2. The window opens slightly. tcp_ack_probe() is called because there are no in-flight packets. However, icsk_probes_tstamp is not cleared because the window is not large enough to fit the tcp_send_head. 3. Packets been sent, and RTO is armed which overrides the probe timer. 4. Subsequent ACKs consistently acknowledge packets and the window never fully closes. Because there are now in-flight packets, tcp_ack_probe() is never called again. 5. As a result, icsk_probes_tstamp is never updated despite the connection no longer in the zero-window state. 6. Much later, another zero window occurs. When probe timer triggers, tcp_probe_timer() evaluates the extremely old icsk_probes_tstamp and immediately aborts the connection due to TCP_USER_TIMEOUT. Fix this by explicitly clearing icsk_probes_tstamp in tcp_ack() whenever prior_packets is non-zero, ensuring that the probe timestamp is reset when exit zero-window state. Fixes: 9d9b1ee0b2d1 ("tcp: fix TCP_USER_TIMEOUT with zero window") Signed-off-by: Zhi Cheng <redacted> --- net/ipv4/tcp_input.c | 1 + 1 file changed, 1 insertion(+)diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index de9f68a9c0cf..02de64881b76 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c@@ -4365,6 +4365,7 @@ static int tcp_ack(struct sock *sk, const structsk_buff *skb, int flag) tp->rcv_tstamp = tcp_jiffies32; if (!prior_packets) goto no_queue; + icsk->icsk_probes_tstamp = 0; /* See if we can take anything off of the retransmit queue. */ flag |= tcp_clean_rtx_queue(sk, skb, prior_fack, prior_snd_una,tcp_ack() is TCP fast path and icsk_probes_tstamp was not yet touched in TCP fast path....
icsk_probes_out already been touched in the fast path, maybe it can also be refactored out to the slow path?
quoted hunk ↗ jump to hunk
diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rstb/Documentation/networking/net_cachelines/inet_connection_sock.rst index cc2000f55c29879a12c0e4d238242b01cee18091..dfb2ecb4c1621f2eac2f3183ed63057af90dba76 100644--- a/Documentation/networking/net_cachelines/inet_connection_sock.rst +++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst@@ -45,7 +45,7 @@ struct icsk_mtup_int search_low read_write struct icsk_mtup_u32:31 probe_size read_writetcp_mtup_init,tcp_connect_init,__tcp_transmit_skb struct icsk_mtup_u32:1 enabled read_write tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_probe,tcp_write_xmit struct icsk_mtup_u32 probe_timestamp read_write tcp_mtup_init,tcp_connect_init,tcp_mtu_check_reprobe,tcp_mtu_probe -u32 icsk_probes_tstamp +u32 icsk_probes_tstamp read_write tcp_ack u32 icsk_user_timeout u64[104/sizeof(u64)] icsk_ca_priv =================================== ====================== =================== =================== ======================================================================================================================================================== An alternative would be to clear icsk_probes_tstamp in a less hot path. Perhaps:diff --git a/include/net/tcp.h b/include/net/tcp.h index f063eccbbba340b39abc79b5541adca369d63d7c..751a407f64c2ba90e7b72d48942506870a994a2e100644--- a/include/net/tcp.h +++ b/include/net/tcp.h@@ -1631,6 +1631,13 @@ static inline void tcp_reset_xmit_timer(struct sock *sk, unsigned long when, bool pace_delay) { + if (what != ICSK_TIME_PROBE0) { + struct inet_connection_sock *icsk = inet_csk(sk); + + if (icsk->icsk_pending == ICSK_TIME_PROBE0) + icsk->icsk_probes_tstamp = 0; + } + if (pace_delay) when += tcp_pacing_delay(sk); inet_csk_reset_xmit_timer(sk, what, when,
What if data received during the zero-window state and memory pressure causing ICSK_TIME_DACK armed by __tcp_send_ack()? I don't think it should clear icsk_probes_tstamp in this case. Maybe checking packets_out directly?
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 98848db62894..f91c7202365e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h@@ -1622,6 +1622,11 @@ static inline void tcp_reset_xmit_timer(struct sock *sk,
unsigned long when,
bool pace_delay)
{
+ struct inet_connection_sock *icsk = inet_csk(sk);
+ struct tcp_sock *tp = tcp_sk(sk);
+ if (tp->packets_out)
+ icsk->icsk_probes_tstamp = 0;
+
if (pace_delay)
when += tcp_pacing_delay(sk);
inet_csk_reset_xmit_timer(sk, what, when,