Re: [PATCH] arm64: smp: smp_send_stop() and crash_smp_send_stop() should try non-NMI first
From: Doug Anderson <dianders@chromium.org>
Date: 2024-02-29 18:34:47
Also in:
linux-hardening, lkml
Hi, On Wed, Feb 28, 2024 at 5:11 AM Daniel Thompson [off-list ref] wrote:
quoted
I'm still hoping to get some sort of feedback here. If people think this is a terrible idea then I'll shut up now and leave well enough alone, but it would be nice to actively decide and get the patch out of limbo.I've read patch through a couple of times and was generally convinced by the "do what x86 does" argument. However until now I've always held my council since I wasn't familiar with these code paths and I figured it was OK for me to have no opinion because the first line of the description says that kgdb/kdb is 100% not involved in causing the problem ;-) . However today I also took a look at the HAVE_NMI architectures and there is no consensus between them about how to implement this: PowerPC uses NMI and most of the others use IRQ only, s390 special cases for the panic code path and acts differently compared to a normal SMP shutdown.
Thanks for taking a look! I think I just included you since long ago you were involved in the pseudo-NMI patches. ;-)
FWIW the x86 route was irq-only and then switching to irq-plus-nmi (after a short trial with NMI-only that had problems with pstore reliability[1]) and that approach has been in place for over a decade now!
Ah, interesting. I guess this isn't a problem for me at the moment since we're not using any alternate pstore backends (ChromeOS just does pstore to RAM), but it's good to confirm that people were facing real issues. This matches what my gut told me: that it's nice to give CPUs a little chance to shut down more cleanly before jamming an NMI down their throats.
However, if we talking ourselves into copying x86 then perhaps we should more accurately copy x86! Assuming I read the x86 code correctly then crash_smp_send_stop() will (mostly) go staight to NMI rather than trialling an IRQ first! That is not what is currently implemented in the patch for arm64.
Sure, I'm happy to change the patch to work that way, though I might wait to get some confirmation from a maintainer that they think this idea is worth pursuing before spending more time on it. I don't think it would be hard to have the "crash stop" code jump straight to NMI if that's what people want. Matching x86 here seems reasonable, though I'd also say that my gut still says that even for crash stop we should try to stop things cleanly before jumping to NMI. I guess I could imagine that the code we're kexec-ing to generate the core file might be more likely to find the hardware in a funny state if we stopped CPUs w/ NMI vs IRQ. -Doug _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel