Thread (7 messages) 7 messages, 2 authors, 2017-05-19

Re: [RFC] arch hardlockup detector interfaces improvement

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2017-05-19 22:54:22
Also in: linuxppc-dev, lkml, sparclinux

On Fri, 19 May 2017 15:43:45 -0400
Don Zickus [off-list ref] wrote:
On Sat, May 20, 2017 at 12:53:06AM +1000, Nicholas Piggin wrote:
quoted
quoted
I am curious to know what IBM thinks there.  Currently the HARDLOCKUP
detector sits on top of perf.  I get the impression, you are removing that
dependency.  Is that a permanent thing or are you thinking of switching back
and forth depending on if SOFTLOCKUP is enabled or not?  
We want to get away from perf permanently.

The PMU interrupts are not specially non-maskable from a hardware
POV, everything gets masked when you turn off interrupts in hardware.
powerpc arch code implements a software disable layer, and PMU
interrupts are differentiated there by being allowed to run even
under local_irq_disable();

We have a few issues with using perf for it. We disable it by
default because using it for that breaks another PMU feature.

But PMU interupts are not special, so it would be possible to e.g.,
take the timer interrupt before soft disable and have it touch the
watchdog if it fires while under local_irq_disable(). That give
exact same kind of pseudo-NMI as perf interrupts, without using PMU.

Further, we now want to introduce a local_pmu_disable() type of
interface that extends this soft disable layer to perf interrupts
as well for some cases. Once we start doing that, more code will
be exempt from the hardlockup watchdog, whereas a watchdog specific
hook from the timer interrupt would still cover it.  
Interesting.  Thanks for that info.

Do you think you can split the patch in half for me?  You are shuffling code
around and making subtle changes in the shuffled code.  It is hard to double
check everything.  Namely watchdog_nmi_reconfigure and watchdog_update_cpus
suddenly appear.

The first patch would be straight code movement, the second the real
changes.  I almost don't care if you throw in the LOCKUP->SOFTLOCKUP changes
in the first patch either.

I am really interested in the subtle changes to make sure you covered the
various race conditions that we have uncovered over the years.
Yes that makes sense, I'll do that.
I also would like to see a sample of the watchdog_nmi_reconfigure() you had
in mind.  Usually we hide all the nmi stuff underneath the watchdog
functions.  But those are threaded, which is what you are trying to avoid,
so the placement of the function makes sense but looks odd.
I'm just calling it after any sysctl changes or suspend/resume etc, and
the lockup detector I have basically just shuts down everything and
then re-starts with the new parameters (could be made much fancier but
I didn't see a need.

I will split out this patch and re-post to you with the powerpc patch in
the series that you can take a look at.

Thanks,
Nick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help