Thread (105 messages) 105 messages, 7 authors, 2008-09-22

Re: [PATCH] tcp FRTO: in-order-only "TCP proxy" fragility workaround

From: Ilpo Järvinen <hidden>
Date: 2008-09-12 10:16:19
Also in: netfilter-devel

Possibly related (same subject, not in this thread)

On Thu, 11 Sep 2008, Dâniel Fraga wrote:
On Thu, 11 Sep 2008 16:44:20 +0300 (EEST)
"Ilpo Järvinen" [off-list ref] wrote:
quoted
...I guess it would be possible to remove SCHED_FEAT_HRTICK from
/proc/sys/kernel/sched_features then while keeping the hrtimers
otherwise enabled to test this.

It's possible that hrtimers just affect on how easy it is to trigger
but at least it seems an useful lead until proven otherwise.
	You're right Ilpo. After days and days without the problem,
today it triggered (but I wasn't online at the time, so I couldn't grab
any data).
Thanks. Once we know what the userspace at the server is doing, it might 
make the problem immediately obvious, though I'm a bit afraid that e.g., 
strace might interfere with the problem so that it resolves right away and 
we're again left with nothing...
	So, you're correct. HRtimers just affect on how easy it is to
trigger the issue. In other words: with high resolution timer enabled,
the problem appears more frequently.

	At least if we discovered a way how to trigger this, we could
test it more easily. The problem is to wait a long time for it to
happen.

	Just a curiosity: on your servers,
I don't really have any I would call "server" in the sense you mean, I 
might occassionally set up one for test from time to time for a very 
limited period but normally it's just ssh and some other which I use so 
rarely that I'd hardly notice, and that's it. I was planning, however,
to setup some day a distcc stress test using all my spare cpu cycles 
(I'd like to put it under kvm but that got stalled due to some timing 
issue at the guest making it to go into an infinite loop), once I get
that working I could probably easily put other test-only stuff to that 
framework as well.

But but, there are other people around the world besides us :-), and 
afaict this is the only (outstanding) report which relates to ceasing of 
accept() so I doubt it's something very regularly occuring thing or we 
would have heard of it.
do you use x86_64?
At least on some machines, but like you have discovered it seems to 
service dependant, so that some processes never got stuck, I might only 
run such or so, who knows...
It seems
this problem is very specific to x86_64 or appear more often on x86_64
than x86_32. It never happens on my x86_32 bit servers.
Ok.

-- 
 i.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help