Thread (7 messages) 7 messages, 2 authors, 2017-05-19

Re: [RFC] arch hardlockup detector interfaces improvement

From: Don Zickus <hidden>
Date: 2017-05-19 19:43:51
Also in: linuxppc-dev, lkml, sparclinux

On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote:
quoted
I am curious to know what IBM thinks there.  Currently the HARDLOCKUP
detector sits on top of perf.  I get the impression, you are removing that
dependency.  Is that a permanent thing or are you thinking of switching back
and forth depending on if SOFTLOCKUP is enabled or not?
We want to get away from perf permanently.

The PMU interrupts are not specially non-maskable from a hardware
POV, everything gets masked when you turn off interrupts in hardware.
powerpc arch code implements a software disable layer, and PMU
interrupts are differentiated there by being allowed to run even
under local_irq_disable();

We have a few issues with using perf for it. We disable it by
default because using it for that breaks another PMU feature.

But PMU interupts are not special, so it would be possible to e.g.,
take the timer interrupt before soft disable and have it touch the
watchdog if it fires while under local_irq_disable(). That give
exact same kind of pseudo-NMI as perf interrupts, without using PMU.

Further, we now want to introduce a local_pmu_disable() type of
interface that extends this soft disable layer to perf interrupts
as well for some cases. Once we start doing that, more code will
be exempt from the hardlockup watchdog, whereas a watchdog specific
hook from the timer interrupt would still cover it.
Interesting.  Thanks for that info.

Do you think you can split the patch in half for me?  You are shuffling code
around and making subtle changes in the shuffled code.  It is hard to double
check everything.  Namely watchdog_nmi_reconfigure and watchdog_update_cpus
suddenly appear.

The first patch would be straight code movement, the second the real
changes.  I almost don't care if you throw in the LOCKUP->SOFTLOCKUP changes
in the first patch either.

I am really interested in the subtle changes to make sure you covered the
various race conditions that we have uncovered over the years.

I also would like to see a sample of the watchdog_nmi_reconfigure() you had
in mind.  Usually we hide all the nmi stuff underneath the watchdog
functions.  But those are threaded, which is what you are trying to avoid,
so the placement of the function makes sense but looks odd.

Thanks!

Cheers,
Don
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help