Re: [PATCH] 8xx: get_mmu_context() for (very) FEW_CONTEXTS and KERNEL_PREEMPT race/starvation issue
From: Guillaume Autran <hidden>
Date: 2005-06-29 21:24:21
Hi Marcelo, Marcelo Tosatti wrote:
Hi Guillaume, On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:quoted
Benjamin Herrenschmidt wrote:quoted
On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:quoted
Hi, I happen to notice a race condition in the mmu_context code for the 8xx with very few context (16 MMU contexts) and kernel preemption enable. It is hard to reproduce has it shows only when many processes are created/destroy and the system is doing a lot of IRQ processing. In short, one process is trying to steal a context that is in the process of being freed (mm->context == NO_CONTEXT) but not completely freed (nr_free_contexts == 0). The steal_context() function does not do anything and the process stays in the loop forever. Anyway, I got a patch that fixes this part. Does not seem to affect scheduling latency at all. Comments are appreciated.Your patch seems to do a hell lot more than fixing this race ... What about just calling preempt_disable() in destroy_context() instead ?I'm still a bit confused with "kernel preemption". One thing for sure is that disabling kernel preemption does indeed fix my problem. So, my question is, what if a task in the middle of being schedule gets preempted by an IRQ handler, where will this task restart execution ? Back at the beginning of schedule or where it left of ?Execution is resumed exactly where it has been interrupted.
In that case, what happen when a higher priority task steal the context of the lower priority task after get_mmu_context() but before set_mmu_context() ? Then when the lower priority task resumes, its context may no longer be valid... Do I get this right ?
quoted
The idea behind my patch was to get rid of that nr_free_contexts counter that is (I thing) redundant with the context_map.Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines. I suppose that what happens is that get_mmu_context() gets preempted after stealing a context (so nr_free_contexts = 0), but before setting next_mmu_context to the next entry next_mmu_context = (ctx + 1) & LAST_CONTEXT; So if the now running higher prio tasks calls switch_mm() (which is likely to happen) it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context() sees "mm->context == CONTEXT". I think that you should try "preempt_disable()/preempt_enable" pair at entry and exit of get_mmu_context() - I suppose around destroy_context() is not enough (you can try that also). spinlock ends up calling preempt_disable().
I'm going to do like this instead of my previous attempt:
/* Setup new userspace context */
preempt_disable();
get_mmu_context(next);
set_context(next->context, next->pgd);
preempt_enable();
To make sure we don't loose our context in between.
Thanks.
Guillaume.
--
=======================================
Guillaume Autran
Senior Software Engineer
MRV Communications, Inc.
Tel: (978) 952-4932 office
E-mail: gautran@mrv.com
=======================================