Thread (3 messages) 3 messages, 3 authors, 2025-06-21

Re: [PATCH] lockdep: Fix inconsistency in irq tracking on NMIs

From: Peter Zijlstra <peterz@infradead.org>
Date: 2025-06-21 08:57:15
Also in: lkml

On Fri, Jun 20, 2025 at 02:51:13PM +0200, Gabriele Monaco wrote:
 local_irq_enable()
   void trace_hardirqs_on(void)
   {
   	if (tracing_irq_cpu) {
   		trace(irq_enable);
   		tracing_irq_cpu = 0;
   	}

                /*
                 * NMI here
                 * tracing_irq_cpu == 0 (done tracing)
                 * lockdep_hardirqs_enabled == 0 (IRQs still disabled)
                 */

                                   irqentry_nmi_enter()
                                       irq_state.lockdep = 0
                                       trace(irq_disable);
				So you're saying this ^^^^^ is the
				actual problem?
                                   irqentry_nmi_exit()
                                       // irq_state.lockdep == 0
                                       // do not trace(irq_enable)
                                Because this ^^^^ might lead one to
				believe the lack of trace(irq_enable)
				is the problem.
   	lockdep_hardirqs_on();
   }

Because I'm thinking the trace(irq_disable) is actually correct. We are
entering an NMI handler, and that very much has IRQs disabled.
quoted hunk ↗ jump to hunk
Prevent this scenario by checking lockdep_hardirqs_enabled to trace also
on nmi_entry.

Fixes: ba1f2b2eaa2a ("x86/entry: Fix NMI vs IRQ state tracking")
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: linux-trace-kernel@vger.kernel.org
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
---
 kernel/entry/common.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index a8dd1f27417cf..7369132c00ba4 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -326,13 +326,15 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
 	irq_state.lockdep = lockdep_hardirqs_enabled();
 
 	__nmi_enter();
-	lockdep_hardirqs_off(CALLER_ADDR0);
+	if (irq_state.lockdep)
+		lockdep_hardirqs_off(CALLER_ADDR0);
This isn't needed... it is perfectly fine calling lockdep_hardirq_off()
again here. You'll hit the redundant_hardirqs_off counter.
 	lockdep_hardirq_enter();
 	ct_nmi_enter();
 
 	instrumentation_begin();
 	kmsan_unpoison_entry_regs(regs);
-	trace_hardirqs_off_finish();
+	if (irq_state.lockdep)
+		trace_hardirqs_off_finish();
So I really think you're doing the wrong thing here. We traced IRQs are
enabled, but then take an NMI, meaning IRQs are very much disabled. So
we want this irqs_off to fire.

The much more fun case is:

	if (tracing_irq_cpu) {
		trace(irq_enable);
		<NMI>

Because then it will see tracing_irq_cpu set, but also have issued
irq_enable and not issue irq_disable, and then things are really messed
up.


So yes, you found a fun case, but your solution seemed aimed at pleasing
the model, rather than reality.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help