Re: TCP stack bug related to F-RTO?
From: Joe Cao <hidden>
Date: 2009-09-25 15:58:13
Also in:
lkml
Possibly related (same subject, not in this thread)
- 2009-09-25 · Re: TCP stack bug related to F-RTO? · Joe Cao <hidden>
- 2009-09-25 · Re: TCP stack bug related to F-RTO? · zhigang gong <hidden>
- 2009-09-25 · Re: TCP stack bug related to F-RTO? · Joe Cao <hidden>
- 2009-09-25 · Re: TCP stack bug related to F-RTO? · zhigang gong <hidden>
Hi Ilpo, Thanks for the reply! Do you happen to know which patch fixed the problem? Is there a bug tracking system for linux kernel? I studied the FRTO code in latest kernel 2.6.31. It seems the problem is still there: 1. Every time a RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() returns true. And the server tcp enters FRTO. 2. After the head of write queue is retransmitted, two new data packets are transmitted, the server receives two dup-ACKs. That will make the TCP enter tcp_enter_frto_loss(), however, that only rests ssthresh and some other fields. 3. After another longer RTO fires, because tcp_is_sackfrto(tp) returns 1, tcp_use_frto() again returns true. The stack enters FRTO again. 4. The above repeats and the stack couldn't retransmits the lost packets faster. Is my understanding above correct? Thanks, Joe
--- On Fri, 9/25/09, Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> wrote:
From: Ilpo Järvinen <redacted> Subject: Re: TCP stack bug related to F-RTO? To: "Ray Lee" <redacted> Cc: "Joe Cao" <redacted>, "Netdev" <redacted>, "LKML" <redacted>, jcaoco2002@yahoo.com Date: Friday, September 25, 2009, 6:09 AM On Thu, 24 Sep 2009, Ray Lee wrote:quoted
[adding netdev cc:] On Thu, Sep 24, 2009 at 10:43 AM, Joe Cao [off-list ref]wrote:quoted
quoted
Hello, I have found the following behavior withdifferent versions of linuxquoted
quoted
kernel. The attached pcap trace is collected withserverquoted
quoted
(192.168.0.13) running 2.6.24 and shows theproblem. Basically thequoted
quoted
behavior is like this: 1. The client opens up a big window, 2. the server sends 19 packets in a row (pkt #14-#32 in the trace), but all of them are dropped due to some congestion.quoted
quoted
3. The server hits RTO and retransmits pkt #14 in#33quoted
quoted
4. The client immediately acks #33 (=#14), andthe server (seems like to enter F-RTO) expends the window and sends *NEW* pkt #35 & #36.=A0 Timeoute is doubled to 2*RTO; The client immediately sends two Dup-ack to #35 and #36.quoted
quoted
5. after 2*RTO, pkt #15 is retransmitted in #39. 6. The client immediately acks #39 (=#15) in #40,and the server continues to expand the window and sends two *NEW* pkt #41 & #42. Now the timeoute is doubled to 4 *RTO.quoted
quoted
8. After 4*RTO timeout, #16 is retransmitted. 9.... 10. The above steps repeats for retransmittingpkt #16-#32 and each time the timeout is doubled.quoted
quoted
11. It takes a long long time to retransmit allthe lost packets and before that is done, the client sends a RST because of timeout.quoted
quoted
The above behavior looks like F-RTO is in effect.And there seems toquoted
quoted
be a bug in the TCP's congestion control andretransmission algorithm.quoted
quoted
Why doesn't the TCP on server (running 2.6.24)enter the slow start?quoted
quoted
Why should the server take that long to recoverfrom a short periodquoted
quoted
of packet loss? Has anyone else noticed similar problem before?If my analysis wasquoted
quoted
wrong, can anyone gives me some pointers towhat's really wrong andquoted
quoted
how to fix it?Yes, 2.6.24 is an obsoleted version with known wrongs in FRTO implementation. Fixes never when to 2.6.24 stable series as it was _already_ obsoleted when the problems where reported and found. The correct fixes may be found from 2.6.25.7 (.7 iirc) and are included from 2.6.26 onward too. Just in case you happen to run ubuntu based kernel from that era (of course you should be reporting the bug here then...), a word of warning: it seemed nearly impossible for them to get a simple thing like that fixed, I haven't been looking if they'd eventually come to some sensible conclusion in that matter or is it still unresolved (or e.g., closed without real resolution). -- i.