Thread (8 messages) 8 messages, 3 authors, 2009-09-26

Re: TCP stack bug related to F-RTO?

From: Joe Cao <hidden>
Date: 2009-09-25 15:58:13
Also in: lkml

Possibly related (same subject, not in this thread)

Hi Ilpo,

Thanks for the reply!  Do you happen to know which patch fixed the problem? Is there a bug tracking system for linux kernel?

I studied the FRTO code in latest kernel 2.6.31.  It seems the problem is still there:  

1. Every time a RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() returns true.  And the server tcp enters FRTO.
2. After the head of write queue is retransmitted, two new data packets are transmitted, the server receives two dup-ACKs.  That will make the TCP enter tcp_enter_frto_loss(), however, that only rests ssthresh and some other fields.
3. After another longer RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() again returns true.  The stack enters FRTO again.
4. The above repeats and the stack couldn't retransmits the lost packets faster.

Is my understanding above correct?

Thanks,
Joe 
--- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
From: Ilpo Järvinen <redacted>
Subject: Re: TCP stack bug related to F-RTO?
To: "Ray Lee" <redacted>
Cc: "Joe Cao" <redacted>, "Netdev" <redacted>, "LKML" <redacted>, jcaoco2002@yahoo.com
Date: Friday, September 25, 2009, 6:09 AM
On Thu, 24 Sep 2009, Ray Lee wrote:
quoted
[adding netdev cc:]

On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao [off-list ref]
wrote:
quoted
quoted
Hello,

I have found the following behavior with
different versions of linux 
quoted
quoted
kernel. The attached pcap trace is collected with
server 
quoted
quoted
(192.168.0.13) running 2.6.24 and shows the
problem. Basically the 
quoted
quoted
behavior is like this: 

1. The client opens up a big window,
2. the server sends 19 packets in a row (pkt #14-
#32 in the trace), but all of them are dropped due to some
congestion.
quoted
quoted
3. The server hits RTO and retransmits pkt #14 in
#33
quoted
quoted
4. The client immediately acks #33 (=#14), and
the server (seems like to enter F-RTO) expends the window
and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to
2*RTO; The client immediately sends two Dup-ack to #35 and
#36.
quoted
quoted
5. after 2*RTO, pkt #15 is retransmitted in #39.
6. The client immediately acks #39 (=#15) in #40,
and the server continues to expand the window and sends two
*NEW* pkt #41 & #42. Now the timeoute is doubled to 4
*RTO.
quoted
quoted
8. After 4*RTO timeout, #16 is retransmitted.
9....
10. The above steps repeats for retransmitting
pkt #16-#32 and each time the timeout is doubled.
quoted
quoted
11. It takes a long long time to retransmit all
the lost packets and before that is done, the client sends a
RST because of timeout.
quoted
quoted
The above behavior looks like F-RTO is in effect.
 And there seems to 
quoted
quoted
be a bug in the TCP's congestion control and
retransmission algorithm. 
quoted
quoted
Why doesn't the TCP on server (running 2.6.24)
enter the slow start? 
quoted
quoted
Why should the server take that long to recover
from a short period 
quoted
quoted
of packet loss?

Has anyone else noticed similar problem before?
 If my analysis was 
quoted
quoted
wrong, can anyone gives me some pointers to
what's really wrong and 
quoted
quoted
how to fix it?
Yes, 2.6.24 is an obsoleted version with known wrongs in
FRTO 
implementation. Fixes never when to 2.6.24 stable series as
it was 
_already_ obsoleted when the problems where reported and
found. The 
correct fixes may be found from 2.6.25.7 (.7 iirc) and are
included from 
2.6.26 onward too.

Just in case you happen to run ubuntu based kernel from
that era (of 
course you should be reporting the bug here then...), a
word of warning: 
it seemed nearly impossible for them to get a simple thing
like that 
fixed, I haven't been looking if they'd eventually come to
some sensible 
conclusion in that matter or is it still unresolved (or
e.g., closed 
without real resolution).

-- 
 i.

      
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help