Re: [PATCH v2 3/6] sched: make double-lock-balance fair
From: Peter Zijlstra <peterz@infradead.org>
Date: 2008-08-27 08:22:06
Also in:
lkml
On Tue, 2008-08-26 at 13:35 -0400, Gregory Haskins wrote:
quoted hunk ↗ jump to hunk
double_lock balance() currently favors logically lower cpus since they often do not have to release their own lock to acquire a second lock. The result is that logically higher cpus can get starved when there is a lot of pressure on the RQs. This can result in higher latencies on higher cpu-ids. This patch makes the algorithm more fair by forcing all paths to have to release both locks before acquiring them again. Since callsites to double_lock_balance already consider it a potential preemption/reschedule point, they have the proper logic to recheck for atomicity violations. Signed-off-by: Gregory Haskins <redacted> --- kernel/sched.c | 52 +++++++++++++++++++++++++++++++++++++++++++++------- 1 files changed, 45 insertions(+), 7 deletions(-)diff --git a/kernel/sched.c b/kernel/sched.c index df6b447..850b454 100644 --- a/kernel/sched.c +++ b/kernel/sched.c@@ -2782,21 +2782,43 @@ static void double_rq_unlock(struct rq *rq1, struct rq *rq2) __release(rq2->lock); } +#ifdef CONFIG_PREEMPT + /* - * double_lock_balance - lock the busiest runqueue, this_rq is locked already. + * fair double_lock_balance: Safely acquires both rq->locks in a fair + * way at the expense of forcing extra atomic operations in all + * invocations. This assures that the double_lock is acquired using the + * same underlying policy as the spinlock_t on this architecture, which + * reduces latency compared to the unfair variant below. However, it + * also adds more overhead and therefore may reduce throughput. */ -static int double_lock_balance(struct rq *this_rq, struct rq *busiest) +static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest) + __releases(this_rq->lock) + __acquires(busiest->lock) + __acquires(this_rq->lock) +{ + spin_unlock(&this_rq->lock); + double_rq_lock(this_rq, busiest); + + return 1; +}
Right - so to belabour Nick's point:
if (!spin_trylock(&busiest->lock)) {
spin_unlock(&this_rq->lock);
double_rq_lock(this_rq, busiest);
}
might unfairly treat someone who is waiting on this_rq if I understand
it right?
I suppose one could then write it like:
if (spin_is_contended(&this_rq->lock) || !spin_trylock(&busiest->lock)) {
spin_unlock(&this_rq->lock);
double_rq_lock(this_rq, busiest);
}
But, I'm not sure that's worth the effort at that point..
Anyway - I think all this is utterly defeated on CONFIG_PREEMPT by the
spin with IRQs enabled logic in kernel/spinlock.c.
Making this an -rt only patch...