Re: [PATCH] lockdep: Fix inconsistency in irq tracking on NMIs
From: Peter Zijlstra <peterz@infradead.org>
Date: 2025-06-21 08:57:15
Also in:
lkml
On Fri, Jun 20, 2025 at 02:51:13PM +0200, Gabriele Monaco wrote:
local_irq_enable()
void trace_hardirqs_on(void)
{
if (tracing_irq_cpu) {
trace(irq_enable);
tracing_irq_cpu = 0;
}
/*
* NMI here
* tracing_irq_cpu == 0 (done tracing)
* lockdep_hardirqs_enabled == 0 (IRQs still disabled)
*/
irqentry_nmi_enter()
irq_state.lockdep = 0
trace(irq_disable);So you're saying this ^^^^^ is the actual problem?
irqentry_nmi_exit()
// irq_state.lockdep == 0
// do not trace(irq_enable)
Because this ^^^^ might lead one to
believe the lack of trace(irq_enable)
is the problem.
lockdep_hardirqs_on(); }
Because I'm thinking the trace(irq_disable) is actually correct. We are entering an NMI handler, and that very much has IRQs disabled.
quoted hunk ↗ jump to hunk
Prevent this scenario by checking lockdep_hardirqs_enabled to trace also on nmi_entry. Fixes: ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking") Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: linux-trace-kernel@vger.kernel.org Signed-off-by: Gabriele Monaco <gmonaco@redhat.com> --- kernel/entry/common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)diff --git a/kernel/entry/common.c b/kernel/entry/common.c index a8dd1f27417cf..7369132c00ba4 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c@@ -326,13 +326,15 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) irq_state.lockdep = lockdep_hardirqs_enabled(); __nmi_enter(); - lockdep_hardirqs_off(CALLER_ADDR0); + if (irq_state.lockdep) + lockdep_hardirqs_off(CALLER_ADDR0);
This isn't needed... it is perfectly fine calling lockdep_hardirq_off() again here. You'll hit the redundant_hardirqs_off counter.
lockdep_hardirq_enter(); ct_nmi_enter(); instrumentation_begin(); kmsan_unpoison_entry_regs(regs); - trace_hardirqs_off_finish(); + if (irq_state.lockdep) + trace_hardirqs_off_finish();
So I really think you're doing the wrong thing here. We traced IRQs are
enabled, but then take an NMI, meaning IRQs are very much disabled. So
we want this irqs_off to fire.
The much more fun case is:
if (tracing_irq_cpu) {
trace(irq_enable);
<NMI>
Because then it will see tracing_irq_cpu set, but also have issued
irq_enable and not issue irq_disable, and then things are really messed
up.
So yes, you found a fun case, but your solution seemed aimed at pleasing
the model, rather than reality.