Thread (182 messages) 182 messages, 27 authors, 2008-08-01

Re: [TCP bug] stuck distcc connections in latest -git

From: Ingo Molnar <hidden>
Date: 2008-07-22 13:57:59
Also in: lkml

* David Newall [off-list ref] wrote:
quoted
The hung condition seemed permanent (i waited a couple of minutes).
Not nearly long enough.  Retransmits can be sent as infrequently as 
per 180 seconds.  I think there's an argument to use one of the the 
various patches that reduce your TCP_RTO_MAX, for example OBATA 
Noboru's (http://marc.info/?l=linux-netdev&m=118422471428855): you 
don't have to wait unreasonably long before seeing a retransmit.  
Remember, three minutes!
i know, i waited much more than 180 minutes - about 15 minutes. That is 
more than enough for this LAN connection.

It's all on the LAN directly via a single gigabit switch and no packet 
dropping. I noticed the hung build immediately as it happened.
quoted
I retried the same build 10 times and it would not reproduce - so 
this again is a hard to reproduce condition. (and there's no chance 
to get a proper tcpdump either, at these traffic levels)
You really should start that capture, and on both client and server. 
You don't need to dump everything, only traffic to or from 
server:distcc.
It's not feasible. That box did in excess of 200 GB of network traffic 
in the past 7 hours alone. ~10 clients are doing make -j200 type of 
kernel builds to this 16way buildbox so it is not realistic to tcpdump 
it - especially given the rarity of this problem. (it has not reoccured 
since then) The network is local LAN, gigabit ethernet over a single 
gigabit switch.

	Ingo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help