Thread (17 messages) 17 messages, 9 authors, 2007-07-23

Re: 2.6.22.1-rt4 lockups

From: Daniel Walker <hidden>
Date: 2007-07-23 20:27:27
Also in: lkml

On Mon, 2007-07-23 at 09:08 -0700, Daniel Walker wrote:
On Sat, 2007-07-21 at 23:07 +0100, Rui Nuno Capela wrote:
quoted
Call Trace:
 [<c010622a>] show_trace_log_lvl+0x1a/0x30
 [<c01062f6>] show_stack_log_lvl+0xb6/0xe0
 [<c0106521>] show_registers+0x201/0x330
 [<c0106768>] die+0x118/0x260
 [<c0304233>] do_page_fault+0x193/0x600
 [<c030294a>] error_code+0x72/0x78
 [<c011b4af>] activate_task+0x4f/0xb0
 [<c011e09d>] try_to_wake_up+0x2bd/0x420
 [<c011e279>] wake_up_process_mutex+0x19/0x20
 [<c01425cc>] wakeup_next_waiter+0xec/0x1a0
 [<c030173c>] rt_spin_lock_slowunlock+0x4c/0x70
 [<c0301ff6>] rt_spin_unlock+0x26/0x30
 [<c015b3e4>] put_zone_pcp+0x14/0x20
 [<c015c265>] get_page_from_freelist+0x145/0x380
 [<c015c4f4>] __alloc_pages+0x54/0x2d0
 [<c01652bd>] __handle_mm_fault+0x7dd/0x9a0
 [<c0304398>] do_page_fault+0x2f8/0x600
 [<c030294a>] error_code+0x72/0x78
 =======================
I was able to reproduce a similar looking hang when I combine kernbench
running with another load (I used ltpstress.sh from LTP) ..

I'm debugging it now ..
It looks like sched_class->enqueue_task() is NULL and that's why the
system hangs ..

The reason why that happens is because check_pgt_cache() is called from
the idle thread, and with PREEMPT_RT check_pgt_cache() locks at least
one mutex .. Once the idle thread is on a wait_list, as soon as it's
woke by the mutex owner the system will crash in enqueue_task. Since the
idle thread has a NULL sched_class->enqueue_task ..

check_pgt_cache() is already getting called from the desched_thread() ,
so I think it could just be removed from i386 cpu_idle().

Anyone have comments on the theory above?

Daniel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help