Thread (55 messages) 55 messages, 3 authors, 2021-03-23

Re: [RFC PATCH v2 4/8] arm64: Detect an EL1 exception frame and mark a stack trace unreliable

From: Madhavan T. Venkataraman <hidden>
Date: 2021-03-23 12:47:09
Also in: linux-arm-kernel, lkml


On 3/23/21 5:42 AM, Mark Rutland wrote:
On Mon, Mar 15, 2021 at 11:57:56AM -0500, madvenka@linux.microsoft.com wrote:
quoted
From: "Madhavan T. Venkataraman" <redacted>

EL1 exceptions can happen on any instruction including instructions in
the frame pointer prolog or epilog. Depending on where exactly they happen,
they could render the stack trace unreliable.

If an EL1 exception frame is found on the stack, mark the stack trace as
unreliable.

Now, the EL1 exception frame is not at any well-known offset on the stack.
It can be anywhere on the stack. In order to properly detect an EL1
exception frame the following checks must be done:

	- The frame type must be EL1_FRAME.

	- When the register state is saved in the EL1 pt_regs, the frame
	  pointer x29 is saved in pt_regs->regs[29] and the return PC
	  is saved in pt_regs->pc. These must match with the current
	  frame.
Before you can do this, you need to reliably identify that you have a
pt_regs on the stack, but this patch uses a heuristic, which is not
reliable.

However, instead you can identify whether you're trying to unwind
through one of the EL1 entry functions, which tells you the same thing
without even having to look at the pt_regs.

We can do that based on the entry functions all being in .entry.text,
which we could further sub-divide to split the EL0 and EL1 entry
functions.
Yes. I will check the entry functions. But I still think that we should
not rely on just one check. The additional checks will make it robust.
So, I suggest that the return address be checked first. If that passes,
then we can be reasonably sure that there are pt_regs. Then, check
the other things in pt_regs.
quoted
Interrupts encountered in kernel code are also EL1 exceptions. At the end
of an interrupt, the interrupt handler checks if the current task must be
preempted for any reason. If so, it calls the preemption code which takes
the task off the CPU. A stack trace taken on the task after the preemption
will show the EL1 frame and will be considered unreliable. This is correct
behavior as preemption can happen practically at any point in code
including the frame pointer prolog and epilog.

Breakpoints encountered in kernel code are also EL1 exceptions. The probing
infrastructure uses breakpoints for executing probe code. While in the probe
code, the stack trace will show an EL1 frame and will be considered
unreliable. This is also correct behavior.

Signed-off-by: Madhavan T. Venkataraman <redacted>
---
 arch/arm64/include/asm/stacktrace.h |  2 +
 arch/arm64/kernel/stacktrace.c      | 57 +++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)
diff --git a/arch/arm64/include/asm/stacktrace.h b/arch/arm64/include/asm/stacktrace.h
index eb29b1fe8255..684f65808394 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -59,6 +59,7 @@ struct stackframe {
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
 	int graph;
 #endif
+	bool reliable;
 };
 
 extern int unwind_frame(struct task_struct *tsk, struct stackframe *frame);
@@ -169,6 +170,7 @@ static inline void start_backtrace(struct stackframe *frame,
 	bitmap_zero(frame->stacks_done, __NR_STACK_TYPES);
 	frame->prev_fp = 0;
 	frame->prev_type = STACK_TYPE_UNKNOWN;
+	frame->reliable = true;
 }
 
 #endif	/* __ASM_STACKTRACE_H */
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 504cd161339d..6ae103326f7b 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -18,6 +18,58 @@
 #include <asm/stack_pointer.h>
 #include <asm/stacktrace.h>
 
+static void check_if_reliable(unsigned long fp, struct stackframe *frame,
+			      struct stack_info *info)
+{
+	struct pt_regs *regs;
+	unsigned long regs_start, regs_end;
+
+	/*
+	 * If the stack trace has already been marked unreliable, just
+	 * return.
+	 */
+	if (!frame->reliable)
+		return;
+
+	/*
+	 * Assume that this is an intermediate marker frame inside a pt_regs
+	 * structure created on the stack and get the pt_regs pointer. Other
+	 * checks will be done below to make sure that this is a marker
+	 * frame.
+	 */
Sorry, but NAK to this approach specifically. This isn't reliable (since
it can be influenced by arbitrary data on the stack), and it's far more
complicated than identifying the entry functions specifically.
As I mentioned above, I agree that we should check the return address. But
just as a precaution, I think we should double check the pt_regs.

Is that OK with you? It does not take away anything or increase the risk in
anyway. I think it makes it more robust.

Thanks.

Madhavan
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help