Thread (19 messages) 19 messages, 6 authors, 2007-07-18

Re: v2.6.21.5-rt19 (sched_getaffinity?)

From: Fernando Lopez-Lezcano <hidden>
Date: 2007-07-09 05:08:41
Also in: lkml

On Sun, 2007-07-08 at 20:53 -0700, Fernando Pablo Lopez-Lezcano wrote: 
On Mon, 9 Jul 2007, Gabriel C wrote:
quoted
Fernando Lopez-Lezcano wrote:
quoted
On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
quoted
* Fernando Lopez-Lezcano [off-list ref] wrote:
quoted
quoted
quoted
Changes since 2.6.21.5-rt18:
- Fixed a nasty and hard to track down slowness / boot problem on SMP
machines with CONFIG_NOHZ enabled. The problem was caused by the timer
wheel base lock held during the get_next_timer_interrupt() call in the
idle path, which eventually led to a bogus PI boosting of the idle task
and in consequence a stale wrong scheduler selection for the affected 
idle
task.

Kudos to Carsten Emde, who patiently and meticulously isolated the
problem and provided the traces, which allowed to identify the root 
cause.

Problem solution: Prevent idle task boosting
Maybe someone remember me whining about troubles with 2.6.21-rt2..18 on 
my Core2 T7200 laptop (fujitsu-siemens amilo i1520).

Althought I'm still with my fingers crossed, I can tell the good news 
are that 2.6.21.5-rt19 (and -rt20) does behave far better now on the 
very same box.
Yes, it works much better indeed...

Ingo: is there a place where I can read about the changes in different 
rtxx releases? What is new/better/fixed in rt20? (I see scheduler stuff 
in a diff from rt19 to rt20 but I don't really know what it means).
and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when CFS 
was merged.

i _think_ Rui might have seen two separate problems. Perhaps by the time 
we fixed the first problem (which Rui saw since -rt2) we introduced the 
other one via -rt11 - which then got fixed in -rt19.
Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
really trying to provide a "stable" rt kernel for audio usage and
including another subsystem into rt is - IMHO - not going to help.
What's the chance of splitting things?
quoted
btw., we'd love to get more feedback regarding CFS. CFS is a completely 
new scheduler for Linux. 
Then I'd rather have it separate from rt.
quoted
It has a design centered around keeping application latencies down, so it 
is ultimately real-time friendly, and it should also make things work 
better for desktop-ish and audio-ish stuff as well. (even under 
SCHED_OTHER)
Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
list):

On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
quoted
Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
on my desktop but on my laptop it makes Firefox and Tomboy to crash.
On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
problem.
I managed to completely hang firefox (fc7) with flash 9 installed
(unkillable even with -9).
Firefox with flash 9 does not work good , there are a lot bugs reported 
about ( just google ) and it hangs on vanilla or whatever other kernels 
as well. Not only Firefox but also Swiftfox, Opera, Epiphany etc.

The most time Firefox dies when you use flash 9 and close a window or a 
tab.
More tests...

The problem is the rt kernel AFAICT, this goes beyond Flash 9, way 
beyond:

_OpenOffice_ hangs with 2.6.21.5-rt20, works fine with stock Fedora 7 
kernel. Flash 9 hangs with 2.6.21.5-rt20, works fine with the stock Fedora 
7 kernel. Same machine booting different kernels, I'd say it is the 
kernel.

The only way out for a hung app is a reboot.

Ingo: what would be a good way to trace this? It makes the rt kernels not 
very usable at least on this hardware (more tests tomorrow in the CCRMA 
machines).

Same on 2.6.21.5-rt18 with CONFIG_NO_HZ not set.
I forgot to include the output of strace... and of course now I can't
repeat the openoffice hang. 

I do get flash 9 (I know, not the best example) and tomboy to hang as
reported by one of my Planet CCRMA users - flash 9 tested working on
stock fedora 7 kernel - and both seem to hang in the same system call:

sched_getaffinity(3528, 32,  <unfinished ...>

Full output of strace attached for both cases. 

Hopefully this will make the bug immediately obvious to someone :-)
[running on a laptop with the 7700 Intel cpu]
-- Fernando

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help