Thread (3 messages) 3 messages, 2 authors, 1d ago

Re: [PATCH net-next] tcp: fix spurious connection aborts due to TCP_USER_TIMEOUT and zero window

From: Eric Dumazet <edumazet@google.com>
Date: 2026-05-27 13:40:25
Subsystem: networking [general], networking [tcp], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Linus Torvalds

On Wed, May 27, 2026 at 1:16 AM Zhi Cheng [off-list ref] wrote:
quoted hunk ↗ jump to hunk
Under certain conditions, a stale icsk_probes_tstamp can lead to an
unexpected connection abort during a zero-window state.

The exact sequence leading to the timeout is as follows:
1. A zero window occurs. icsk_probes_tstamp is set.
2. The window opens slightly. tcp_ack_probe() is called because there
    are no in-flight packets. However, icsk_probes_tstamp is not cleared
    because the window is not large enough to fit the tcp_send_head.
3. Packets been sent, and RTO is armed which overrides the probe timer.
4. Subsequent ACKs consistently acknowledge packets and the window
    never fully closes. Because there are now in-flight packets,
    tcp_ack_probe() is never called again.
5. As a result, icsk_probes_tstamp is never updated despite the
    connection no longer in the zero-window state.
6. Much later, another zero window occurs. When probe timer triggers,
    tcp_probe_timer() evaluates the extremely old icsk_probes_tstamp and
    immediately aborts the connection due to TCP_USER_TIMEOUT.

Fix this by explicitly clearing icsk_probes_tstamp in tcp_ack()
whenever prior_packets is non-zero, ensuring that the probe
timestamp is reset when exit zero-window state.

Fixes: 9d9b1ee0b2d1 ("tcp: fix TCP_USER_TIMEOUT with zero window")
Signed-off-by: Zhi Cheng <redacted>
---
  net/ipv4/tcp_input.c | 1 +
  1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index de9f68a9c0cf..02de64881b76 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4365,6 +4365,7 @@ static int tcp_ack(struct sock *sk, const struct
sk_buff *skb, int flag)
        tp->rcv_tstamp = tcp_jiffies32;
        if (!prior_packets)
                goto no_queue;
+       icsk->icsk_probes_tstamp = 0;

        /* See if we can take anything off of the retransmit queue. */
        flag |= tcp_clean_rtx_queue(sk, skb, prior_fack, prior_snd_una,
tcp_ack() is TCP fast path and  icsk_probes_tstamp was not yet touched
in TCP fast path....
diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rst
b/Documentation/networking/net_cachelines/inet_connection_sock.rst
index cc2000f55c29879a12c0e4d238242b01cee18091..dfb2ecb4c1621f2eac2f3183ed63057af90dba76
100644
--- a/Documentation/networking/net_cachelines/inet_connection_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst
@@ -45,7 +45,7 @@ struct icsk_mtup_int                search_low
      read_write
 struct icsk_mtup_u32:31             probe_size             read_write
tcp_mtup_init,tcp_connect_init,__tcp_transmit_skb
 struct icsk_mtup_u32:1              enabled                read_write

tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_probe,tcp_write_xmit
 struct icsk_mtup_u32                probe_timestamp        read_write

tcp_mtup_init,tcp_connect_init,tcp_mtu_check_reprobe,tcp_mtu_probe
-u32                                 icsk_probes_tstamp
+u32                                 icsk_probes_tstamp
         read_write          tcp_ack
 u32                                 icsk_user_timeout
 u64[104/sizeof(u64)]                icsk_ca_priv
 =================================== ======================
=================== ===================
========================================================================================================================================================

An alternative would be to clear icsk_probes_tstamp in a less hot path.

Perhaps:
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f063eccbbba340b39abc79b5541adca369d63d7c..751a407f64c2ba90e7b72d48942506870a994a2e
100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1631,6 +1631,13 @@ static inline void tcp_reset_xmit_timer(struct sock *sk,
                                        unsigned long when,
                                        bool pace_delay)
 {
+       if (what != ICSK_TIME_PROBE0) {
+               struct inet_connection_sock *icsk = inet_csk(sk);
+
+               if (icsk->icsk_pending == ICSK_TIME_PROBE0)
+                       icsk->icsk_probes_tstamp = 0;
+       }
+
        if (pace_delay)
                when += tcp_pacing_delay(sk);
        inet_csk_reset_xmit_timer(sk, what, when,
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help