Thread (21 messages) 21 messages, 9 authors, 2014-01-24

Re: [ANNOUNCE] 3.12.6-rt9

From: Mike Galbraith <hidden>
Date: 2014-01-18 03:15:46
Also in: lkml

On Fri, 2014-01-17 at 18:00 +0100, Sebastian Andrzej Siewior wrote: 
* Mike Galbraith | 2013-12-24 16:47:47 [+0100]:
quoted
I built this kernel with Paul's patch and NO_HZ_FULL enabled again on 64
core box.  I haven't seen RCU grip yet, but I just checked on it after
3.5 hours into this boot/beat (after fixing crash+kdump setup), and
found it in the process of dumping. 
So you also have the timers-do-not-raise-softirq-unconditionally.patch?
Oh dear, there's holidays, vacation, and massive turkey overdose between
then and now, but I'm almost positive that the tree was virgin $subject,
with only Paul's patch enabled, that being what I wanted to beat on.
I have a small problem with understanding this…

|#24 [ffff880273a03cd0] run_timer_softirq at ffffffff81069002

Here we obtain wait_lock from tvec_base of _this_ CPU. And we get to
init_lists() before the apic timer kicks in. So we have the wait_lock.
gdb fibs a little, we're acquiring.
quoted hunk ↗ jump to hunk
--- <IRQ stack> ---
quoted
#21 [ffff880273a03b28] apic_timer_interrupt at ffffffff815cbf9d
   [exception RIP: _raw_spin_lock+50]
In the hard interrupt triggered by the apic timer we get to
get_next_timer_interrupt() and go again for same the wait_lock. Here we
have the try_lock so we avoid this deadlock.
The odd part: we get the lock. It should be the same lock because both use
| struct tvec_base *base = __this_cpu_read(tvec_bases);
to ge it. And we shouldn't get it because the lock is already hold.
We get into trouble in the unlock path where we spin forever:

|#14 [ffff880276803e50] rt_spin_unlock_after_trylock_in_irq at ffffffff815c3425
|#12 [ffff880276803e28] _raw_spin_trylock at ffffffff815c3790

which releases the lock with a trylock in order to keep lockdep happy.
My understanding was that we should be able to obtain the wait_lock here
since we were able to obtain it in the lock path and in irq off context
there is nothing that could take the lock in the meantime.
IIRC, we were endlessly trying, but with an un-punched ticket under us,
and no Xen like evilness to save the day.

I've since cleaned out my crashdump directory and moved on to frolicking
with hotplug gremlins, so don't have that one to revisit, but the don't
unconditionally raise timer softirq patch is the bad guy.

-Mike
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help