[PATCH v4 2/4] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()
From: Will Deacon <hidden>
Date: 2018-11-14 22:30:31
Also in:
linux-mips, linux-sh, linuxppc-dev, lkml
On Mon, Nov 12, 2018 at 10:26:56AM -0800, Douglas Anderson wrote:
When I had lockdep turned on and dropped into kgdb I got a nice splat on my system. Specifically it hit: DEBUG_LOCKS_WARN_ON(current->hardirq_context) Specifically it looked like this: sysrq: SysRq : DEBUG ------------[ cut here ]------------ DEBUG_LOCKS_WARN_ON(current->hardirq_context) WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27 pstate: 604003c9 (nZCv DAIF +PAN -UAO) pc : lockdep_hardirqs_on+0xf0/0x160 ... Call trace: lockdep_hardirqs_on+0xf0/0x160 trace_hardirqs_on+0x188/0x1ac kgdb_roundup_cpus+0x14/0x3c kgdb_cpu_enter+0x53c/0x5cc kgdb_handle_exception+0x180/0x1d4 kgdb_compiled_brk_fn+0x30/0x3c brk_handler+0x134/0x178 do_debug_exception+0xfc/0x178 el1_dbg+0x18/0x78 kgdb_breakpoint+0x34/0x58 sysrq_handle_dbg+0x54/0x5c __handle_sysrq+0x114/0x21c handle_sysrq+0x30/0x3c qcom_geni_serial_isr+0x2dc/0x30c ... ... irq event stamp: ...45 hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4 hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130 softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34 softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100 ---[ end trace adf21f830c46e638 ]--- Looking closely at it, it seems like a really bad idea to be calling local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems like it could violate spinlock semantics and cause a deadlock. Instead, let's use a private csd alongside smp_call_function_single_async() to round up the other CPUs. Using smp_call_function_single_async() doesn't require interrupts to be enabled so we can remove the offending bit of code. In order to avoid duplicating this across all the architectures that use the default kgdb_roundup_cpus(), we'll add a "weak" implementation to debug_core.c. Looking at all the people who previously had copies of this code, there were a few variants. I've attempted to keep the variants working like they used to. Specifically: * For arch/arc we passed NULL to kgdb_nmicallback() instead of get_irq_regs(). * For arch/mips there was a bit of extra code around kgdb_nmicallback() NOTE: In this patch we will still get into trouble if we try to round up a CPU that failed to round up before. We'll try to round it up again and potentially hang when we try to grab the csd lock. That's not new behavior but we'll still try to do better in a future patch. Suggested-by: Daniel Thompson <redacted> Signed-off-by: Douglas Anderson <dianders@chromium.org> --- Changes in v4: None Changes in v3: - No separate init call. - Don't round up the CPU that is doing the rounding up. - Add "#ifdef CONFIG_SMP" to match the rest of the file. - Updated desc saying we don't solve the "failed to roundup" case. - Document the ignored parameter. Changes in v2: - Removing irq flags separated from fixing lockdep splat. - Don't use smp_call_function (Daniel). arch/arc/kernel/kgdb.c | 10 ++-------- arch/arm/kernel/kgdb.c | 12 ------------ arch/arm64/kernel/kgdb.c | 12 ------------ arch/hexagon/kernel/kgdb.c | 27 --------------------------- arch/mips/kernel/kgdb.c | 9 +-------- arch/powerpc/kernel/kgdb.c | 4 ++-- arch/sh/kernel/kgdb.c | 12 ------------ include/linux/kgdb.h | 15 +++++++++++++-- kernel/debug/debug_core.c | 35 +++++++++++++++++++++++++++++++++++ 9 files changed, 53 insertions(+), 83 deletions(-)
[...]
quoted hunk ↗ jump to hunk
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index f3cadda45f07..23f2b5613afa 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c@@ -55,6 +55,7 @@ #include <linux/mm.h> #include <linux/vmacache.h> #include <linux/rcupdate.h> +#include <linux/irq.h> #include <asm/cacheflush.h> #include <asm/byteorder.h>@@ -220,6 +221,40 @@ int __weak kgdb_skipexception(int exception, struct pt_regs *regs) return 0; } +#ifdef CONFIG_SMP + +/* + * Default (weak) implementation for kgdb_roundup_cpus + */ + +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd); + +void __weak kgdb_call_nmi_hook(void *ignored) +{ + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs()); +}
I suppose you could pass the cpu as an argument, but it doesn't really matter. Also, I think there are cases where the CSD callback can run without having received an IPI, so we could potentially up passing NULL for the regs here which probably goes boom.
+
+void __weak kgdb_roundup_cpus(void)
+{
+ call_single_data_t *csd;
+ int this_cpu = get_cpu();Do you actually need to disable preemption here? afaict, irqs are already disabled by the kgdb core.
+ int cpu;
+
+ for_each_cpu(cpu, cpu_online_mask) {for_each_online_cpu(cpu) ? I'm assuming this is serialised wrt CPU hotplug somehow? Will