Thread (17 messages) 17 messages, 3 authors, 2023-07-31

Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer()

From: Menglong Dong <hidden>
Date: 2023-07-31 03:41:57
Also in: lkml

On Sat, Jul 29, 2023 at 1:15 AM Neal Cardwell [off-list ref] wrote:
On Fri, Jul 28, 2023 at 7:25 AM Neal Cardwell [off-list ref] wrote:
quoted
On Fri, Jul 28, 2023 at 1:50 AM Eric Dumazet [off-list ref] wrote:
quoted
On Fri, Jul 28, 2023 at 8:25 AM Menglong Dong [off-list ref] wrote:
quoted
On Fri, Jul 28, 2023 at 12:44 PM Neal Cardwell [off-list ref] wrote:
quoted
On Thu, Jul 27, 2023 at 7:57 PM Menglong Dong [off-list ref] wrote:
quoted
On Fri, Jul 28, 2023 at 3:31 AM Eric Dumazet [off-list ref] wrote:
quoted
On Thu, Jul 27, 2023 at 2:52 PM [off-list ref] wrote:
quoted
From: Menglong Dong <redacted>

In tcp_retransmit_timer(), a window shrunk connection will be regarded
as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not
right all the time.

The retransmits will become zero-window probes in tcp_retransmit_timer()
if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to
TCP_RTO_MAX sooner or later.

However, the timer is not precise enough, as it base on timer wheel.
Sorry that I am not good at timer, but I know the concept of time-wheel.
The longer of the timer, the rougher it will be. So the timeout is not
triggered after TCP_RTO_MAX, but 122877ms as I tested.

Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true
once the RTO come up to TCP_RTO_MAX.

Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout',
which is exact the timestamp of the timeout.

Signed-off-by: Menglong Dong <redacted>
---
 net/ipv4/tcp_timer.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 470f581eedd4..3a20db15a186 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -511,7 +511,11 @@ void tcp_retransmit_timer(struct sock *sk)
                                            tp->snd_una, tp->snd_nxt);
                }
 #endif
-               if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) {
+               /* It's a little rough here, we regard any valid packet that
+                * update tp->rcv_tstamp as the reply of the retransmitted
+                * packet.
+                */
+               if ((u32)icsk->icsk_timeout - tp->rcv_tstamp > TCP_RTO_MAX) {
                        tcp_write_err(sk);
                        goto out;
                }
One potential pre-existing issue with this logic: if the connection is
restarting from idle, then tp->rcv_tstamp could already be a long time
(minutes or hours) in the past even on the first RTO, in which case
the very first RTO that found a zero tp->snd_wnd  would find this
check returns true, and would destroy the connection immediately. This
seems extremely brittle.
Yes, this scenario can happen and cause the connection
break unexpectedly.
AFAICT it would be safer to replace this logic with a call to the
standard tcp_write_timeout() logic that has a more robust check to see
if the connection should be destroyed.
Yes, we need a more robust check here. But I think tcp_write_timeout()
is not a good choice. The icsk->icsk_retransmits won't increase and
can keep being 0 in this scenario, which makes tcp_write_timeout()
always return 0.

Enn...let me think again.
neal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help