Thread (58 messages) 58 messages, 7 authors, 2009-12-09

Re: scp stalls mysteriously

From: Ilpo Järvinen <hidden>
Date: 2009-11-29 22:13:25

Possibly related (same subject, not in this thread)

On Sat, 28 Nov 2009, Ilpo Järvinen wrote:
I restored Ccs. Please keep them.

On Sat, 28 Nov 2009, Frederic Leroy wrote:
quoted
Le Sat, 28 Nov 2009 00:12:23 +0200 (EET),
"Ilpo Järvinen" [off-list ref] a écrit :
quoted
On Fri, 27 Nov 2009, Frederic Leroy wrote:
quoted
I put traces of stall here : 
http://www.starox.org/pub/scp_stall/
Your proc/net/tcp capture on houba was perhaps made too late? ...The 
connection is missing already.
It could be ! I had a doubt while using my 2 keyboards ... 

For information for the pcaps, I filtered and used "tcpdump ... ether
host xx:xx:xx:xx:xx"
I waited a bit after the stall and kill the scp with ctrl-c.
quoted
But anyway, at least the problem is visible...
Great!
quoted
It seems that
3998:4046 gets never retransmitted, not even by RTO which seems very
very strange to me... And after this: 23:21:56.154269 IP
192.168.1.19.50028 > 192.168.1.15.22: . ack 3998 win 379 ... sack 3 
{4238:4286}{4142:4190}{4046:4094}> also fast retransmit should have 
already triggered. ...I'll look more into this if I can figure it out 
from the current traces but it'll take a while.
Can it help you, if I make other traces ?

I won't be available until monday.
Perhaps having the /proc/net/tcp would at least tell what state the timer 
is (if I cannot reproduce right away). ...It is rather strange that two 
independent mechanisms for loss recovery seem both to fail to get 
triggered here, no traces of retransmission whatsoever. I think it is for 
now enough to concentrate on what happens on 192.168.1.15 (=houba?) and 
get tcpdump and proc/net/tcp from there, the other end/direction has very 
little significance here (except for the fact that bidirectionality might 
be needed to actually trigger it). You could even think of getting 
proc/net/tcp a bit more often, right from the start:

while [ : ]; do grep ":0016" /proc/net/tcp; sleep 0.1; done | tee scp_stall-houba.x.proc_net_tcp

...Please wait at least 2 minutes before hitting ctrl-c or otherwise 
artificially intervening.
So far no luck in reproducing the exactly same scenario as you do, 
however, I'm currently solving another problem I found related to excess 
growth in RTT estimator which is enough for me to get a temporal, but 
long-lasting, - stalled - with scp (that growth happens only with 
timestamps so if I disable them I've better success with the transfer).

-- 
 i.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help