Thread (3 messages) 3 messages, 2 authors, 2018-12-14

Re: BUG: ftrace/perf dropping events at the begin of interrupt handlers

From: Steven Rostedt <rostedt@goodmis.org>
Date: 2018-12-04 19:16:51
Also in: lkml
Subsystem: the rest · Maintainer: Linus Torvalds

On Thu, 22 Nov 2018 10:45:05 +0100
Daniel Bristot de Oliveira [off-list ref] wrote:
While developing the automata [1], I've hit cases in which need resched
and/or sched wakeup events were being fired with preemption and/or
interrupts enabled. However, this is not possible because interrupts must
be disabled to avoid concurrence with an interrupt handler, and the
preemption must be disabled to avoid concurrence with the scheduler.
The tool I use to validate the model is based on perf, and it was
complaining about this situation. I’ve talked to Arnaldo about it
two months.

Further debug on perf has shown that those cases always took place
associated with the occurrence of interrupts. At ELC Europe
Marko Pusch (Siemens) also mentioned hitting cases in which he saw
missing events related to IRQ handling, but using ftrace. Steven and
I also discussed this during the last Plumbers (Vancouver - CA) and we
agreed that there is a problem on ftrace too.

To reproduce this problem with ftrace, one needs to enable function
tracer and do kernel operations in a CPU in which IRQs are taking place.

For instance, in a single CPU VM, run:

# while [ 1 ]; do echo > /dev/null; done

In a shell, and 

# trace-cmd record -b [enough buff to avoid missing trace because of buffer overun] -p function sleep 5

In another shell.

Then, using trace-cmd report --debug, we can see the problem. Here is one
example of output, first the expected one:
================ Faulty execution =================

Thus, ftrace and perf, sometimes, are dropping events at the begin of
interrupt handlers. And that is The reason why interrupt disable and
preempt disable were not being recorded (and causing problem in the
automata execution) is that these events take place in the very early
execution of the interrupt handler, in the section in which the
perf/ftrace are dropping events [ without notifying ].

[1] This is a good demonstration of problems that can be found using the
automata model presented in this workshop paper:

Modeling the Behavior of Threads in the PREEMPT_RT Linux Kernel Using
Automata. Daniel Bristot de Oliveira, Tommaso Cucinotta, Rômulo Silva de
Oliveira - EWiLi'2018 – Embedded operating system workshop Torino, Italy,
4 October 2018.

And in the presentations:
"Mind the gap between real-time Linux and real-time theory"
"How can we catch problems that can break the PREEMPT_RT preemption model?"
At the Linux Plumbers (Vancouver - CA)

Steven is already aware of this problem, and he is working on it.
Yes, it's a simple fix. The problem is that the recursion detection of
the function tracer requires that when its called from interrupt, the
"in_interrupt" needs to be true, otherwise it thinks that the function
tracer is recursing on itself (which is common).

Looking an the dropped events, and the code in __irq_enter() we have
this:

#define __irq_enter()					\
	do {						\
		account_irq_enter_time(current);	\
		preempt_count_add(HARDIRQ_OFFSET);	\ <<-- in_interrupt() returns true here
		trace_hardirq_enter();			\
	} while (0)

Interesting enough, the dropped events happen to be in
account_irq_enter_time()!

Thus what I believe is happening is that an interrupt came in while one
event was being recorded. When account_irq_enter_time was called, the
function tracer noticed that its recursion bit for the current context
was already set, and just dropped the event because it thought it was
just tracing itself. After we add HARDIRQ_OFFSET to preempt_count, the
"in_interrupt()" will be set and the function tracer will know its in a
new context where its safe to continue tracing.

Can you try this patch to see if it fixes it for you?

-- Steve
diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 0fbbcdf0c178..0290531ebe3c 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -35,8 +35,8 @@ extern void rcu_nmi_exit(void);
  */
 #define __irq_enter()					\
 	do {						\
-		account_irq_enter_time(current);	\
 		preempt_count_add(HARDIRQ_OFFSET);	\
+		account_irq_enter_time(current);	\
 		trace_hardirq_enter();			\
 	} while (0)
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help