Thread (105 messages) 105 messages, 7 authors, 2008-09-22

Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

From: Ilpo Järvinen <hidden>
Date: 2008-09-22 11:22:12
Also in: netfilter-devel

Possibly related (same subject, not in this thread)

On Mon, 22 Sep 2008, Dâniel Fraga wrote:
On Fri, 19 Sep 2008 00:04:23 +0300 (EEST)
"Ilpo Järvinen" [off-list ref] wrote:
quoted
Anyway, if/when you succeed collecting some strace of the server 
processes, please let me know (though putting a full one available might 
not be wise thing like I said earlier). After I thought it a bit, it might 
be enough the start the strace with -p for all server processes of a 
service during a stall and then resolve it after some amount of waiting 
with nmap (and hope that strace doesn't resolve it by interfering 
something relevant :-), you will see that from the fact that it resolves 
without nmap then). That would probably reveal if the processes where 
waiting in accept() or not, and if not, where they were.
	Hi again Ilpo, I waited the whole day for a stall, and
fortunatelly it happened while I was stracing dovecot and child
processes. The stall happened at 01:11 (at the end). I hope that it
has something useful.
It definately shows a stall, there are _no_ events between 0:53 and 1:11 
while there isn't any other period like that, every other minute since the 
start has some activity going on :-). So this might not be related to 
networking at all like we've kind of already figured out (definately 
accept() has very little to do here). There weren't close()'es there 
either so it looks very stuck on something that's outside of the syscalls 
we listed in -e, I suppose...

It seems that next sensible step is to just obtain a full strace to see 
what actually took place during those long minutes if anything (it's 
better that you keep that log private and just use grep over it on 
request). ...A full strace might grow huge though. Also, for strace use 
-tt instead of -t to get more accurate timestamps and add -T.

When you get the stall next time, please also check that the processes are 
actually sleeping instead of looping like crazy in some buggy userspace 
code :-) (obviously before resolving it with nmap).

When using nmap to resolve, take note on exact timestamp (including 
seconds). E.g., 
$ date > nmap.ts; nmap ...


-- 
 i.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help