Thread (27 messages) 27 messages, 5 authors, 2014-03-22

Re: Tasks stuck in futex code (in 3.14-rc6)

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: 2014-03-20 16:41:35
Also in: lkml

On Wed, Mar 19, 2014 at 10:56 PM, Davidlohr Bueso [off-list ref] wrote:
This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters.
Ok, so thinking about hb_waiters_pending():

 - spin_is_locked() test
 - read barrier
 - plist_head_empty() test

seems to be broken after all.

The race is against futex_wait() that does

 - futex_wait_setup():
   - queue_lock()
   - get_futex_value_locked()
 - futex_wait_queue_me()
   - queue_me()
     - plist_add()

right?

It strikes me that the "spin_is_locked()" test has no barriers wrt the
writing of the new futex value on the wake path. And the read barrier
obviously does nothing wrt the write either. Or am I missing
something? So the write that actually released the futex might be
almost arbitrarily delayed on the waking side. So the waiting side may
not see the new value, even though the waker assumes it does due to
the ordering of it doing the write first.

So maybe we need a memory barrier in hb_waiters_pending() just to make
sure that the new value is written.

But at that point, I suspect that Davidlohrs original patch that just
had explicit waiting counts is just as well. The whole point with the
head empty test was to emulate that "do we have waiters" without
having to actually add the atomics, but a memory barrier is really no
worse.

The attached is a TOTALLY UNTESTED interdiff that adds back Davidlohrs
atomic count. It may be terminally broken, I literally didn't test it
at all.

Davidlohr, mind checking and correcting this?

                     Linus

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help