Thread (14 messages) 14 messages, 4 authors, 2024-06-25

Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first

From: Doug Anderson <dianders@chromium.org>
Date: 2024-02-29 18:34:47
Also in: linux-hardening, lkml

Hi,

On Wed, Feb 28, 2024 at 5:11 AM Daniel Thompson
[off-list ref] wrote:
quoted
I'm still hoping to get some sort of feedback here. If people think
this is a terrible idea then I'll shut up now and leave well enough
alone, but it would be nice to actively decide and get the patch out
of limbo.
I've read patch through a couple of times and was generally convinced by
the "do what x86 does" argument.

However until now I've always held my council since I wasn't familiar
with these code paths and I figured it was OK for me to have no opinion
because the first line of the description says that kgdb/kdb is 100% not
involved in causing the problem ;-) .

However today I also took a look at the HAVE_NMI architectures and there
is no consensus between them about how to implement this: PowerPC uses
NMI and most of the others use IRQ only, s390 special cases for the
panic code path and acts differently compared to a normal SMP shutdown.
Thanks for taking a look! I think I just included you since long ago
you were involved in the pseudo-NMI patches. ;-)

FWIW the x86 route was irq-only and then switching to irq-plus-nmi
(after a short trial with NMI-only that had problems with pstore
reliability[1]) and that approach has been in place for over
a decade now!
Ah, interesting. I guess this isn't a problem for me at the moment
since we're not using any alternate pstore backends (ChromeOS just
does pstore to RAM), but it's good to confirm that people were facing
real issues. This matches what my gut told me: that it's nice to give
CPUs a little chance to shut down more cleanly before jamming an NMI
down their throats.

However, if we talking ourselves into copying x86 then perhaps we should
more accurately copy x86! Assuming I read the x86 code correctly then
crash_smp_send_stop() will (mostly) go staight to NMI rather
than trialling an IRQ first! That is not what is currently implemented
in the patch for arm64.
Sure, I'm happy to change the patch to work that way, though I might
wait to get some confirmation from a maintainer that they think this
idea is worth pursuing before spending more time on it. I don't think
it would be hard to have the "crash stop" code jump straight to NMI if
that's what people want. Matching x86 here seems reasonable, though
I'd also say that my gut still says that even for crash stop we should
try to stop things cleanly before jumping to NMI. I guess I could
imagine that the code we're kexec-ing to generate the core file might
be more likely to find the hardware in a funny state if we stopped
CPUs w/ NMI vs IRQ.


-Doug

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help