Thread (46 messages) 46 messages, 6 authors, 2011-09-22

Re: [PATCH -rt] ipc/sem: Rework semaphore wakeups

From: Peter Zijlstra <peterz@infradead.org>
Date: 2011-09-14 19:23:42
Also in: lkml

On Wed, 2011-09-14 at 20:48 +0200, Manfred Spraul wrote:
On 09/14/2011 11:57 AM, Peter Zijlstra wrote:
quoted
Subject: ipc/sem: Rework semaphore wakeups
From: Peter Zijlstra<redacted>
Date: Tue Sep 13 15:09:40 CEST 2011

Current sysv sems have a weird ass wakeup scheme that involves keeping
preemption disabled over a potential O(n^2) loop and busy waiting on
that on other CPUs.
Have you checked that the patch improves the latency?
Note that  the busy wait only happens if there is a simultaneous timeout 
of a semtimedop() and a true wakeup.

The code does:

     spin_lock()
     preempt_disable();
     usually_very_simple_but_worstcase_O_2
     spin_unlock()
     usually_very_simple_but_worstcase_O_1
     preempt_enable();

with your change, it becomes:

     spin_lock()
     usually_very_simple_but_worstcase_O_2
     usually_very_simple_but_worstcase_O_1
     spin_unlock()

The complex ops remain unchanged, they are still under a lock.
preemptible lock (aka pi-mutex) on -rt, so no weird latencies.
What about removing the preempt_disable?
It's only there to cover a rare race on uniprocessor preempt systems.
(a task is woken up simultaneously due to timeout of semtimedop() and a 
true wakeup)

Then fix the that race - something like the attached patch [obviously 
buggy - see the fixme]
sched_yield() is always a bug, as is it here. Its an life-lock if the
woken task is of higher priority than the waking task. A higher prio
FIFO task calling sched_yield() in a loop is just that, a loop, starving
the lower prio waker.

If you've got enough medium prio tasks around to occupy all other cpus,
you're got indefinite priority inversion, so even on smp its a problem.

But yeah its not the prettiest of solutions but it works.. see that
other patch with the wake-list stuff for something that ought to work
for both rt and mainline (except of course it doesn't actually work).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help