Thread (3 messages) 3 messages, 2 authors, 4d ago

Re: [PATCH net-next] tcp: fix spurious connection aborts due to TCP_USER_TIMEOUT and zero window

From: chengzhi <hidden>
Date: 2026-05-28 07:59:36
Subsystem: networking [general], networking [tcp], the rest · Maintainers: "David S. Miller", Eric Dumazet, Jakub Kicinski, Paolo Abeni, Neal Cardwell, Linus Torvalds

On 2026-5-27 21:40, Eric Dumazet wrote:
On Wed, May 27, 2026 at 1:16 AM Zhi Cheng [off-list ref] wrote:
quoted
Under certain conditions, a stale icsk_probes_tstamp can lead to an
unexpected connection abort during a zero-window state.

The exact sequence leading to the timeout is as follows:
1. A zero window occurs. icsk_probes_tstamp is set.
2. The window opens slightly. tcp_ack_probe() is called because there
     are no in-flight packets. However, icsk_probes_tstamp is not cleared
     because the window is not large enough to fit the tcp_send_head.
3. Packets been sent, and RTO is armed which overrides the probe timer.
4. Subsequent ACKs consistently acknowledge packets and the window
     never fully closes. Because there are now in-flight packets,
     tcp_ack_probe() is never called again.
5. As a result, icsk_probes_tstamp is never updated despite the
     connection no longer in the zero-window state.
6. Much later, another zero window occurs. When probe timer triggers,
     tcp_probe_timer() evaluates the extremely old icsk_probes_tstamp and
     immediately aborts the connection due to TCP_USER_TIMEOUT.

Fix this by explicitly clearing icsk_probes_tstamp in tcp_ack()
whenever prior_packets is non-zero, ensuring that the probe
timestamp is reset when exit zero-window state.

Fixes: 9d9b1ee0b2d1 ("tcp: fix TCP_USER_TIMEOUT with zero window")
Signed-off-by: Zhi Cheng <redacted>
---
   net/ipv4/tcp_input.c | 1 +
   1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index de9f68a9c0cf..02de64881b76 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4365,6 +4365,7 @@ static int tcp_ack(struct sock *sk, const struct
sk_buff *skb, int flag)
         tp->rcv_tstamp = tcp_jiffies32;
         if (!prior_packets)
                 goto no_queue;
+       icsk->icsk_probes_tstamp = 0;

         /* See if we can take anything off of the retransmit queue. */
         flag |= tcp_clean_rtx_queue(sk, skb, prior_fack, prior_snd_una,
tcp_ack() is TCP fast path and  icsk_probes_tstamp was not yet touched
in TCP fast path....
icsk_probes_out already been touched in the fast path, maybe it can also
be refactored out to the slow path?
quoted hunk ↗ jump to hunk
diff --git a/Documentation/networking/net_cachelines/inet_connection_sock.rst
b/Documentation/networking/net_cachelines/inet_connection_sock.rst
index cc2000f55c29879a12c0e4d238242b01cee18091..dfb2ecb4c1621f2eac2f3183ed63057af90dba76
100644
--- a/Documentation/networking/net_cachelines/inet_connection_sock.rst
+++ b/Documentation/networking/net_cachelines/inet_connection_sock.rst
@@ -45,7 +45,7 @@ struct icsk_mtup_int                search_low
       read_write
  struct icsk_mtup_u32:31             probe_size             read_write
tcp_mtup_init,tcp_connect_init,__tcp_transmit_skb
  struct icsk_mtup_u32:1              enabled                read_write

tcp_mtup_init,tcp_sync_mss,tcp_connect_init,tcp_mtu_probe,tcp_write_xmit
  struct icsk_mtup_u32                probe_timestamp        read_write

tcp_mtup_init,tcp_connect_init,tcp_mtu_check_reprobe,tcp_mtu_probe
-u32                                 icsk_probes_tstamp
+u32                                 icsk_probes_tstamp
          read_write          tcp_ack
  u32                                 icsk_user_timeout
  u64[104/sizeof(u64)]                icsk_ca_priv
  =================================== ======================
=================== ===================
========================================================================================================================================================

An alternative would be to clear icsk_probes_tstamp in a less hot path.

Perhaps:
diff --git a/include/net/tcp.h b/include/net/tcp.h
index f063eccbbba340b39abc79b5541adca369d63d7c..751a407f64c2ba90e7b72d48942506870a994a2e
100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1631,6 +1631,13 @@ static inline void tcp_reset_xmit_timer(struct sock *sk,
                                         unsigned long when,
                                         bool pace_delay)
  {
+       if (what != ICSK_TIME_PROBE0) {
+               struct inet_connection_sock *icsk = inet_csk(sk);
+
+               if (icsk->icsk_pending == ICSK_TIME_PROBE0)
+                       icsk->icsk_probes_tstamp = 0;
+       }
+
         if (pace_delay)
                 when += tcp_pacing_delay(sk);
         inet_csk_reset_xmit_timer(sk, what, when,
What if data received during the zero-window state and memory pressure
causing ICSK_TIME_DACK armed by __tcp_send_ack()? I don't think it
should clear icsk_probes_tstamp in this case.

Maybe checking packets_out directly?
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 98848db62894..f91c7202365e 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -1622,6 +1622,11 @@ static inline void tcp_reset_xmit_timer(struct 
sock *sk,
                                         unsigned long when,
                                         bool pace_delay)
  {
+       struct inet_connection_sock *icsk = inet_csk(sk);
+       struct tcp_sock *tp = tcp_sk(sk);
+       if (tp->packets_out)
+               icsk->icsk_probes_tstamp = 0;
+
         if (pace_delay)
                 when += tcp_pacing_delay(sk);
         inet_csk_reset_xmit_timer(sk, what, when,
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help