Thread (17 messages) 17 messages, 3 authors, 2023-07-31

Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer()

From: Menglong Dong <hidden>
Date: 2023-07-31 08:24:55
Also in: lkml

On Fri, Jul 28, 2023 at 10:25 PM Neal Cardwell [off-list ref] wrote:
On Fri, Jul 28, 2023 at 1:50 AM Eric Dumazet [off-list ref] wrote:
[...]
In that packetdrill case AFAICT that is the ZWP timer firing, and the
sender sends a ZWP.

I think maybe Menglong is looking more at something like the following
scenario, where at the time the RTO timer fires the data sender finds
the tp->snd_wnd is zero, so it sends a retransmit of the
lowest-sequence data packet.

Here is a packetdrill case and the tcpdump trace on an upstream
net-next kernel... I have not worked out all the details at the end,
but perhaps it can help move the discussion forward:


~/packetdrill/gtests/net/tcp/receiver_window# cat rwin-rto-zero-window.pkt
// Test how sender reacts to unexpected arrival rwin of 0.

`../common/defaults.sh`

// Create a socket.
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// Establish a connection.
  +.1 < S 0:0(0) win 65535 <mss 1000,nop,nop,sackOK,nop,wscale 6>
   +0 > S. 0:0(0) ack 1 win 65535 <mss 1460,nop,nop,sackOK,nop,wscale 14>
  +.1 < . 1:1(0) ack 1 win 457
   +0 accept(3, ..., ...) = 4

   +0 write(4, ..., 20000) = 20000
   +0 > P. 1:10001(10000) ack 1

// TLP
  +.2 > . 10001:11001(1000) ack 1
// Receiver has retracted rwin to 0
// (perhaps from the 2023 proposed OOM code?).
  +.1 < . 1:1(0) ack 1 win 0

// RTO, and in tcp_retransmit_timer() we see the receiver window is zero,
// so we take the special f (!tp->snd_wnd...) code path.
  +.2 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

  +.5 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

 +1.2 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

 +2.4 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

 +4.8 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

 +9.6 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

+19.2 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

+38.4 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

+76.8 > . 1:1001(1000) ack 1
  +.1 < . 1:1(0) ack 1 win 0

+120 > . 1:1001(1000) ack 1
 +.1 < . 1:1(0) ack 1 win 0

+120 > . 1:1001(1000) ack 1
 +.1 < . 1:1(0) ack 1001 win 1000

// Received non-zero window update. Send more data.
  +0 > P. 1001:3001(2000) ack 1
 +.1 < . 1:1(0) ack 3001 win 1000

----------
When I run that script on a net-next kernel I see the rounding up of
the RTO to 122 secs rather than 120 secs, but for whatever reason the
script does not cause the socket to die early...
I think I know the reason now. Without the 2nd patches that I send
in this series, the ACK can't update the rwin to 0, as it will be ignored
in tcp_may_update_window().

However, you can send an ACK that acknowledges the new data
to update the rwin to 0. I modified your script, and it can die as we
excepted:

// Test how sender reacts to unexpected arrival rwin of 0.

`../common/defaults.sh`

// Create a socket.
    0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

// Establish a connection.
  +.1 < S 0:0(0) win 65535 <mss 1000,nop,nop,sackOK,nop,wscale 6>
   +0 > S. 0:0(0) ack 1 win 65535 <mss 1440,nop,nop,sackOK,nop,wscale 8>
  +.1 < . 1:1(0) ack 1 win 457
   +0 accept(3, ..., ...) = 4

   +0 write(4, ..., 20000) = 20000
   +0 > P. 1:10001(10000) ack 1

// Update the window to 0. "ack 0 win 0" won't update the window, as it
// will be ignored by tcp_may_update_window()
  +.1 < . 1:1(0) ack 1001 win 0

// RTO, and in tcp_retransmit_timer() we see the receiver window is zero,
// so we take the special f (!tp->snd_wnd...) code path.
  +.2 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

  +.5 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

 +1.2 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

 +2.4 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

 +4.8 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

 +9.6 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

+19.2 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

+38.4 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

+76.8 > . 1001:2001(1000) ack 1
  +.1 < . 1:1(0) ack 1001 win 0

// socket will die in tcp_retransmit_timer() in the
// "tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX" code path.
// Following retransmit won't happen.
+120 > . 1001:2001(1000) ack 1
 +.1 < . 1:1(0) ack 1001 win 0
------------------------------------------------------------------------------

I don't know how to check the die of socket with
packetdrill, so I checked it by:
  ss -nitme | grep 8080 | grep on
And I can see the socket die after timeout of the 120seconds
timer.

$ packetdrill ./rwin-rto-zero-window.pkt
./rwin-rto-zero-window.pkt:55: error handling packet: Timed out
waiting for packet

The tcpdump trace:

 tcpdump -ttt -n -i any port 8080 &

->

~/packetdrill/gtests/net/tcp/receiver_window#
../../packetdrill/packetdrill rwin-rto-zero-window.pkt
 00:01:01.370344 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [S], seq 0, win 65535, options [mss
1000,nop,nop,sackOK,nop,wscale 6], length 0
 00:00:00.000096 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [S.], seq 3847169154, ack 1, win 65535, options [mss
1460,nop,nop,sackOK,nop,wscale 14], length 0
 00:00:00.100277 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 457, length 0
 00:00:00.000090 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [P.], seq 1:2001, ack 1, win 4, length 2000: HTTP
 00:00:00.000006 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [P.], seq 2001:4001, ack 1, win 4, length 2000: HTTP
 00:00:00.000003 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [P.], seq 4001:6001, ack 1, win 4, length 2000: HTTP
 00:00:00.000002 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [P.], seq 6001:8001, ack 1, win 4, length 2000: HTTP
 00:00:00.000001 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [P.], seq 8001:10001, ack 1, win 4, length 2000: HTTP
 00:00:00.209131 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 10001:11001, ack 1, win 4, length 1000: HTTP
 00:00:00.100190 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:00.203824 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100175 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:00.507835 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100192 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:01.115858 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100182 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:02.331747 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100198 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:04.955980 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100197 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:09.627985 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100179 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:19.355725 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100203 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:00:42.395633 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100202 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:01:17.724059 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100201 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:02:02.779516 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100229 tun0  In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1, win 0, length 0
 00:02:02.779828 tun0  Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 1:1001, ack 1, win 4, length 1000: HTTP
 00:00:00.100230 ?     In  IP 192.0.2.1.51231 > 192.168.56.132.8080:
Flags [.], ack 1001, win 1000, length 0
 00:00:00.000034 ?     Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 11001:12001, ack 1, win 4, length 1000: HTTP
 00:00:00.000005 ?     Out IP 192.168.56.132.8080 > 192.0.2.1.51231:
Flags [.], seq 12001:13001, ack 1, win 4, length 1000: HTTP

rwin-rto-zero-window.pkt:62: error handling packet: live packet field
tcp_psh: expected: 1 (0x1) vs actual: 0 (0x0)
script packet: 405.390244 P. 1001:3001(2000) ack 1
actual packet: 405.390237 . 11001:13001(2000) ack 1 win 4
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help