Thread (55 messages) 55 messages, 3 authors, 2021-03-23

Re: [RFC PATCH v2 1/8] arm64: Implement stack trace termination record

From: Madhavan T. Venkataraman <hidden>
Date: 2021-03-19 22:04:06
Also in: live-patching, lkml
Subsystem: arm64 port (aarch64 architecture), the rest · Maintainers: Catalin Marinas, Will Deacon, Linus Torvalds


On 3/19/21 1:19 PM, Madhavan T. Venkataraman wrote:

On 3/19/21 9:29 AM, Madhavan T. Venkataraman wrote:
quoted

On 3/19/21 7:30 AM, Mark Brown wrote:
quoted
On Thu, Mar 18, 2021 at 03:26:13PM -0500, Madhavan T. Venkataraman wrote:
quoted
On 3/18/21 10:09 AM, Mark Brown wrote:
quoted
quoted
If we are going to add the extra record there would probably be less
potential for confusion if we pointed it at some sensibly named dummy
function so anything or anyone that does see it on the stack doesn't get
confused by a NULL.
quoted
I agree. I will think about this some more. If no other solution presents
itself, I will add the dummy function.
After discussing this with Mark Rutland offlist he convinced me that so
long as we ensure the kernel doesn't print the NULL record we're
probably OK here, the effort setting the function pointer up correctly
in all circumstances (especially when we're not in the normal memory
map) is probably not worth it for the limited impact it's likely to have
to see the NULL pointer (probably mainly a person working with some
external debugger).  It should be noted in the changelog though, and/or
merged in with the relevant change to the unwinder.
OK. I will add a comment as well as note it in the changelog.

Thanks to both of you.

Madhavan
I thought about this some more. I think I have a simple solution. I will
prepare a patch and send it out. If you and Mark Rutland approve, I will
include it in the next version of this RFC.

Madhavan
I solved this by using existing functions logically instead of inventing a
dummy function. I initialize pt_regs->stackframe[1] to an existing function
so that the stack trace will not show a 0x0 entry as well as the kernel and
gdb will show identical stack traces.

For all task stack traces including the idle tasks, the stack trace will
end at copy_thread() as copy_thread() is the function that initializes the
pt_regs and the first stack frame for a task.

For EL0 exceptions, the stack trace will end with vectors() as vectors
entries call the EL0 handlers.

Here are sample stack traces (I only show the ending of each trace):

Idle task on primary CPU
========================

		 ...
[    0.022557]   start_kernel+0x5b8/0x5f4
[    0.022570]   __primary_switched+0xa8/0xb8
[    0.022578]   copy_thread+0x0/0x188

Idle task on secondary CPU
==========================

		 ...
[    0.023397]   secondary_start_kernel+0x188/0x1e0
[    0.023406]   __secondary_switched+0x40/0x88
[    0.023415]   copy_thread+0x0/0x188

All other kernel threads
========================

		 ...
[   13.501062]   ret_from_fork+0x10/0x18
[   13.507998]   copy_thread+0x0/0x188

User threads (EL0 exception)
============

write(2) system call example:

		 ...
[  521.686148]   vfs_write+0xc8/0x2c0
[  521.686156]   ksys_write+0x74/0x108
[  521.686161]   __arm64_sys_write+0x24/0x30
[  521.686166]   el0_svc_common.constprop.0+0x70/0x1a8
[  521.686175]   do_el0_svc+0x2c/0x98
[  521.686180]   el0_svc+0x2c/0x70
[  521.686188]   el0_sync_handler+0xb0/0xb8
[  521.686193]   el0_sync+0x17c/0x180
[  521.686198]   vectors+0x0/0x7d8

Here are the code changes:

========================================================================
diff --git a/arch/arm64/kernel/entry.S b/arch/arm64/kernel/entry.S
index a31a0a713c85..514307e80b79 100644
--- a/arch/arm64/kernel/entry.S
+++ b/arch/arm64/kernel/entry.S
@@ -261,16 +261,19 @@ alternative_else_nop_endif
 	stp	lr, x21, [sp, #S_LR]
 
 	/*
-	 * For exceptions from EL0, terminate the callchain here.
+	 * For exceptions from EL0, terminate the callchain here at
+	 * task_pt_regs(current)->stackframe.
+	 *
 	 * For exceptions from EL1, create a synthetic frame record so the
 	 * interrupted code shows up in the backtrace.
 	 */
 	.if \el == 0
-	mov	x29, xzr
+	ldr	x17, =vectors
+	stp	xzr, x17, [sp, #S_STACKFRAME]
 	.else
 	stp	x29, x22, [sp, #S_STACKFRAME]
-	add	x29, sp, #S_STACKFRAME
 	.endif
+	add	x29, sp, #S_STACKFRAME
 
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 alternative_if_not ARM64_HAS_PAN
diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
index 66b0e0b66e31..699e0dd313a1 100644
--- a/arch/arm64/kernel/head.S
+++ b/arch/arm64/kernel/head.S
@@ -393,6 +393,29 @@ SYM_FUNC_START_LOCAL(__create_page_tables)
 	ret	x28
 SYM_FUNC_END(__create_page_tables)
 
+	/*
+	 * The boot task becomes the idle task for the primary CPU. The
+	 * CPU startup task on each secondary CPU becomes the idle task
+	 * for the secondary CPU.
+	 *
+	 * The idle task does not require pt_regs. But create a dummy
+	 * pt_regs so that task_pt_regs(idle_task)->stackframe can be
+	 * set up to be the last frame on the idle task stack just like
+	 * all the other kernel tasks. This helps the unwinder to
+	 * terminate the stack trace at a well-known stack offset.
+	 *
+	 * Also, set up the last return PC to be copy_thread() just
+	 * like all the other kernel tasks so that the stack trace of
+	 * all kernel tasks ends with the same function.
+	 */
+	.macro setup_last_frame
+	sub	sp, sp, #PT_REGS_SIZE
+	ldr	x17, =copy_thread
+	stp	xzr, x17, [sp, #S_STACKFRAME]
+	add	x29, sp, #S_STACKFRAME
+	adr	x30, #0
+	.endm
+
 /*
  * The following fragment of code is executed with the MMU enabled.
  *
@@ -447,8 +470,7 @@ SYM_FUNC_START_LOCAL(__primary_switched)
 #endif
 	bl	switch_to_vhe			// Prefer VHE if possible
 	add	sp, sp, #16
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 	b	start_kernel
 SYM_FUNC_END(__primary_switched)
 
@@ -606,8 +628,7 @@ SYM_FUNC_START_LOCAL(__secondary_switched)
 	cbz	x2, __secondary_too_slow
 	msr	sp_el0, x2
 	scs_load x2, x3
-	mov	x29, #0
-	mov	x30, #0
+	setup_last_frame
 
 #ifdef CONFIG_ARM64_PTR_AUTH
 	ptrauth_keys_init_cpu x2, x3, x4, x5
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 325c83b1a24d..9050699ff67c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -437,6 +437,12 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start,
 	}
 	p->thread.cpu_context.pc = (unsigned long)ret_from_fork;
 	p->thread.cpu_context.sp = (unsigned long)childregs;
+	/*
+	 * For the benefit of the unwinder, set up childregs->stackframe
+	 * as the last frame for the new task.
+	 */
+	p->thread.cpu_context.fp = (unsigned long)childregs->stackframe;
+	childregs->stackframe[1] = (unsigned long)copy_thread;
 
 	ptrace_hw_copy_thread(p);
 
======================================================================

If you approve, the above will become RFC Patch v3 1/8 in the next version.

Let me know.

Also, I could introduce an extra frame in the EL1 exception stack trace that
includes vectors so the stack trace would look like this (timer interrupt example):

call_timer_fn
run_timer_softirq
__do_softirq
irq_exit
__handle_domain_irq
gic_handle_irq
el1_irq
vectors

This way, if the unwinder finds vectors, it knows that it is an exception frame.

Madhavan

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help