Thread (47 messages) 47 messages, 5 authors, 2020-08-18

Re: [RFC 0/5] Introduce NMI aware serial drivers

From: Sumit Garg <hidden>
Date: 2020-08-12 14:53:02
Also in: linux-serial, lkml

Hi Doug,

On Tue, 11 Aug 2020 at 22:46, Doug Anderson [off-list ref] wrote:
Hi,

On Tue, Aug 11, 2020 at 7:58 AM Greg Kroah-Hartman
[off-list ref] wrote:
quoted
On Tue, Aug 11, 2020 at 07:59:24PM +0530, Sumit Garg wrote:
quoted
Hi Greg,

Thanks for your comments.

On Tue, 11 Aug 2020 at 19:27, Greg Kroah-Hartman
[off-list ref] wrote:
quoted
On Tue, Aug 11, 2020 at 07:20:26PM +0530, Sumit Garg wrote:
quoted
On Tue, 21 Jul 2020 at 17:40, Sumit Garg [off-list ref] wrote:
quoted
Make it possible for UARTs to trigger magic sysrq from an NMI. With the
advent of pseudo NMIs on arm64 it became quite generic to request serial
device interrupt as an NMI rather than IRQ. And having NMI driven serial
RX will allow us to trigger magic sysrq as an NMI and hence drop into
kernel debugger in NMI context.

The major use-case is to add NMI debugging capabilities to the kernel
in order to debug scenarios such as:
- Primary CPU is stuck in deadlock with interrupts disabled and hence
  doesn't honor serial device interrupt. So having magic sysrq triggered
  as an NMI is helpful for debugging.
- Always enabled NMI based magic sysrq irrespective of whether the serial
  TTY port is active or not.

Currently there is an existing kgdb NMI serial driver which provides
partial implementation in upstream to have a separate ttyNMI0 port but
that remained in silos with the serial core/drivers which made it a bit
odd to enable using serial device interrupt and hence remained unused. It
seems to be clearly intended to avoid almost all custom NMI changes to
the UART driver.

But this patch-set allows the serial core/drivers to be NMI aware which
in turn provides NMI debugging capabilities via magic sysrq and hence
there is no specific reason to keep this special driver. So remove it
instead.

Approach:
---------

The overall idea is to intercept serial RX characters in NMI context, if
those are specific to magic sysrq then allow corresponding handler to run
in NMI context. Otherwise, defer all other RX and TX operations onto IRQ
work queue in order to run those in normal interrupt context.

This approach is demonstrated using amba-pl011 driver.

Patch-wise description:
-----------------------

Patch #1 prepares magic sysrq handler to be NMI aware.
Patch #2 adds NMI framework to serial core.
Patch #3 and #4 demonstrates NMI aware uart port using amba-pl011 driver.
Patch #5 removes kgdb NMI serial driver.

Goal of this RFC:
-----------------

My main reason for sharing this as an RFC is to help decide whether or
not to continue with this approach. The next step for me would to port
the work to a system with an 8250 UART.
A gentle reminder to seek feedback on this series.
It's been on my list for a while.  I started it Friday but ran out of
time.  This week hasn't been going as smoothly as I hoped but I'll
prioritize this since it's been too long.
No worries and thanks for your feedback.
quoted
quoted
quoted
It's the middle of the merge window, and I can't do anything.

Also, I almost never review RFC patches as I have have way too many
patches that people think are "right" to review first...
Okay, I understand and I can definitely wait for your feedback.
My feedback here is this:
quoted
quoted
I suggest you work to flesh this out first and submit something that you
feels works properly.
:)
quoted
IIUC, in order to make this approach substantial I need to make it
work with 8250 UART (major serial driver), correct? As currently it
works properly for amba-pl011 driver.
Yes, try to do that, or better yet, make it work with all serial drivers
automatically.
A bit of early feedback...

Although I'm not sure we can do Greg's "make it work everywhere
automatically", it's possible you could get half of your patch done
automatically.  Specifically, your patch really does two things:

a) It leaves the serial port "active" all the time to look for sysrq.
In other words even if there is no serial client it's always reading
the port looking for characters.  IMO this concept should be separated
out from the NMI concept and _could_ automatically work for all serial
drivers.  You'd just need something in the serial core that acted like
a default client if nobody else opened the serial port.  The nice
thing here is that we go through all the normal code paths and don't
need special cases in the driver.
Okay, will try to explore this option to have default serial port
client. Would this client be active in normal serial operation or only
active when we have kgdb active? One drawback I see for normal
operation could be power management as if user is not using serial
port and would like to disable corresponding clock in order to reduce
power consumption.
b) It enables NMI for your particular serial driver.  This seems like
it'd be hard to do automatically because you can't do the same things
at NMI that you could do in a normal interrupt handler.
Agree.
NOTE: to me, a) is more important than b) (though it'd be nice to have
both).  This would be especially true the earlier you could make a)
work since the main time when an "agetty" isn't running on my serial
port to read characters is during bootup.

Why is b) less important to me? Sure, it would let you drop into the
debugger in the case where the CPU handling serial port interrupts is
hung with IRQs disabled, but it _woudln't_ let you drop into the
debugger in the case where a different CPU is hung with IRQs disabled.
To get that we need NMI roundup (which, I know, you are also working
on for arm64).  ...and, if we've got NMI roundup, presumably we can
find our way into the debugger by either moving the serial interrupt
to a different CPU ahead of time or using some type of lockup detector
(which I know you are also working on for arm64).
Thanks for sharing your preferences. I will try to get a) sorted out first.

Overall I agree with your approaches to debug hard-lockup scenarios
but they might not be so trivial for kernel engineers who doesn't
posses kernel debugging experience as you do. :)

And I still think NMI aware magic sysrq is useful for scenarios such as:
- Try to get system information during hard-lockup rather than just
panic via hard-lockup detection.
- Do normal start/stop debugger activity on a core which was stuck in
hard-lockup.
- Random boot freezes which are not easily reproducible.
One last bit of feedback is that I noticed that you didn't try to
implement the old "knock" functionality of the old NMI driver that's
being deleted.  That is: your new patches don't provide an alternate
way to drop into the debugger for systems where BREAK isn't hooked up.
That's not a hard requirement, but I was kinda hoping for it since I
have some systems that haven't routed BREAK properly.  ;-)
Yeah, this is on my TODO list to have a kgdb "knock" functionality to
be implemented via a common hook in serial core.
I'll try to get some more detailed feedback in the next few days.
Thanks. I do look forward to your feedback.

-Sumit
-Doug
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help