Re: rt14: strace -> migrate_disable_atomic imbalance
From: Peter Zijlstra <peterz@infradead.org>
Date: 2011-09-22 15:13:36
Also in:
lkml
On Thu, 2011-09-22 at 16:52 +0200, Oleg Nesterov wrote:
On 09/22, Peter Zijlstra wrote:quoted
+static void wait_task_inactive_sched_in(struct preempt_notifier *n, int cpu) +{ + struct task_struct *p; + struct wait_task_inactive_blocked *blocked = + container_of(n, struct wait_task_inactive_blocked, notifier); + + hlist_del(&n->link); + + p = ACCESS_ONCE(blocked->waiter); + blocked->waiter = NULL; + wake_up_process(p); +} ... +static void +wait_task_inactive_sched_out(struct preempt_notifier *n, struct task_struct *next) +{ + if (current->on_rq) /* we're not inactive yet */ + return; + + hlist_del(&n->link); + n->ops = &wait_task_inactive_ops_post; + hlist_add_head(&n->link, &next->preempt_notifiers); +}Tricky ;) Yes, the first ->sched_out() is not enough.
Not enough isn't the problem, its ran with rq->lock held and irqs disabled, you simply cannot do ttwu() from there. If we could, the subsequent task_rq_lock() in wait_task_inactive() would be enough to serialize against the still in-flight context switch. One of the problems with doing it from the next sched_in notifier, is that next can be idle, and then we do a A -> idle -> B switch, which is of course sub-optimal.
quoted
unsigned long wait_task_inactive(struct task_struct *p, long match_state) { ... + rq = task_rq_lock(p, &flags); + trace_sched_wait_task(p); + if (!p->on_rq) /* we're already blocked */ + goto done;This doesn't look right. schedule() clears ->on_rq a long before __switch_to/etc.
Oh, bugger, yes its before we can drop the rq for idle balance and nonsense like that. (!p->on_rq && !p->on_cpu) should suffice I think.
And it seems that we check ->on_cpu above, this is not UP friendly.
True, but its what the old code did.. and I was seeing performance suckage compared to the unpatched kernel (not that the p->on_cpu busy wait fixed it)...
quoted
- set_current_state(TASK_UNINTERRUPTIBLE); - schedule_hrtimeout(&to, HRTIMER_MODE_REL); - continue; - } + hlist_add_head(&blocked.notifier.link, &p->preempt_notifiers); + task_rq_unlock(rq, p, &flags);I thought about reimplementing wait_task_inactive() too, but afaics there is a problem: why we can't race with p doing register_preempt_notifier() ? I guess register_ needs rq->lock too.
We can actually, now you mention it.. ->pi_lock would be sufficient and less expensive to acquire.