Re: [PATCH net-next 3/3] net: tcp: check timeout by icsk->icsk_timeout in tcp_retransmit_timer()
From: Menglong Dong <hidden>
Date: 2023-07-31 03:41:57
Also in:
lkml
On Sat, Jul 29, 2023 at 1:15 AM Neal Cardwell [off-list ref] wrote:
On Fri, Jul 28, 2023 at 7:25 AM Neal Cardwell [off-list ref] wrote:quoted
On Fri, Jul 28, 2023 at 1:50 AM Eric Dumazet [off-list ref] wrote:quoted
On Fri, Jul 28, 2023 at 8:25 AM Menglong Dong [off-list ref] wrote:quoted
On Fri, Jul 28, 2023 at 12:44 PM Neal Cardwell [off-list ref] wrote:quoted
On Thu, Jul 27, 2023 at 7:57 PM Menglong Dong [off-list ref] wrote:quoted
On Fri, Jul 28, 2023 at 3:31 AM Eric Dumazet [off-list ref] wrote:quoted
On Thu, Jul 27, 2023 at 2:52 PM [off-list ref] wrote:quoted
From: Menglong Dong <redacted> In tcp_retransmit_timer(), a window shrunk connection will be regarded as timeout if 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX'. This is not right all the time. The retransmits will become zero-window probes in tcp_retransmit_timer() if the 'snd_wnd==0'. Therefore, the icsk->icsk_rto will come up to TCP_RTO_MAX sooner or later. However, the timer is not precise enough, as it base on timer wheel. Sorry that I am not good at timer, but I know the concept of time-wheel. The longer of the timer, the rougher it will be. So the timeout is not triggered after TCP_RTO_MAX, but 122877ms as I tested. Therefore, 'tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX' is always true once the RTO come up to TCP_RTO_MAX. Fix this by replacing the 'tcp_jiffies32' with '(u32)icsk->icsk_timeout', which is exact the timestamp of the timeout. Signed-off-by: Menglong Dong <redacted> --- net/ipv4/tcp_timer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 470f581eedd4..3a20db15a186 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c@@ -511,7 +511,11 @@ void tcp_retransmit_timer(struct sock *sk) tp->snd_una, tp->snd_nxt); } #endif - if (tcp_jiffies32 - tp->rcv_tstamp > TCP_RTO_MAX) { + /* It's a little rough here, we regard any valid packet that + * update tp->rcv_tstamp as the reply of the retransmitted + * packet. + */ + if ((u32)icsk->icsk_timeout - tp->rcv_tstamp > TCP_RTO_MAX) { tcp_write_err(sk); goto out; }One potential pre-existing issue with this logic: if the connection is restarting from idle, then tp->rcv_tstamp could already be a long time (minutes or hours) in the past even on the first RTO, in which case the very first RTO that found a zero tp->snd_wnd would find this check returns true, and would destroy the connection immediately. This seems extremely brittle.
Yes, this scenario can happen and cause the connection break unexpectedly.
AFAICT it would be safer to replace this logic with a call to the standard tcp_write_timeout() logic that has a more robust check to see if the connection should be destroyed.
Yes, we need a more robust check here. But I think tcp_write_timeout() is not a good choice. The icsk->icsk_retransmits won't increase and can keep being 0 in this scenario, which makes tcp_write_timeout() always return 0. Enn...let me think again.
neal