Re: perf PPC: kernel panic with callchains and context switch events
From: Paul Mackerras <hidden>
Date: 2011-07-25 00:22:13
Also in:
lkml
On Wed, Jul 20, 2011 at 03:57:51PM -0600, David Ahern wrote:
I am hoping someone familiar with PPC can help understand a panic that is generated when capturing callchains with context switch events. Call trace is below. The short of it is that walking the callchain generates a page fault. To handle the page fault the mmap_sem is needed, but it is currently held by setup_arg_pages. setup_arg_pages calls shift_arg_pages with the mmap_sem held. shift_arg_pages then calls move_page_tables which has a cond_resched at the top of its for loop. If the cond_resched() is removed from move_page_tables everything works beautifully - no panics. So, the question: is it normal for walking the stack to trigger a page fault on PPC? The panic is not seen on x86 based systems.
Walking the user stack can certainly generate a page fault; walking the kernel stack should never generate a page fault. If any page fault is generated reading the user stack frame, we're supposed to detect that and fall back to walking the page tables manually (see read_user_stack_64() in arch/powerpc/kernel/perf_callchain.c). I think I need to check our __get_user_inatomic() implementation. I don't think removing the cond_resched() from move_page_tables is the right answer. Paul.