Re: [RFC PATCH v1 0/4] arm64: Implement stack trace reliability checks | linux-arm-kernel

(off-list ancestor, not in this archive)

On Tue, Mar 30, 2021 at 02:09:51PM -0500, madvenka@linux.microsoft.com wrote:
From: "Madhavan T. Venkataraman" <redacted>

There are a number of places in kernel code where the stack trace is not
reliable. Enhance the unwinder to check for those cases and mark the
stack trace as unreliable. Once all of the checks are in place, the unwinder
can be used for livepatching.
This assumes all such places are known.  That's a big assumption, as

a) hand-written asm code may inadvertantly skip frame pointer setup;

b) for inline asm which calls a function, the compiler may blindly
   insert it into a function before the frame pointer setup.

That's where objtool stack validation would come in.

Detect EL1 exception frame
==========================

EL1 exceptions can happen on any instruction including instructions in
the frame pointer prolog or epilog. Depending on where exactly they happen,
they could render the stack trace unreliable.

Add all of the EL1 exception handlers to special_functions[].

	- el1_sync()
	- el1_irq()
	- el1_error()
	- el1_sync_invalid()
	- el1_irq_invalid()
	- el1_fiq_invalid()
	- el1_error_invalid()
A possibly more robust alternative would be to somehow mark el1
exception frames so the unwinder can detect them more generally.

For example, as described in my previous email, encode the frame pointer
so the unwinder can detect el1 frames automatically.

Detect ftrace frame
===================

When FTRACE executes at the beginning of a traced function, it creates two
frames and calls the tracer function:

	- One frame for the traced function

	- One frame for the caller of the traced function

That gives a sensible stack trace while executing in the tracer function.
When FTRACE returns to the traced function, the frames are popped and
everything is back to normal.

However, in cases like live patch, the tracer function redirects execution
to a different function. When FTRACE returns, control will go to that target
function. A stack trace taken in the tracer function will not show the target
function. The target function is the real function that we want to track.
So, the stack trace is unreliable.
I don't think this is a real problem.  Livepatch only checks the stacks
of blocked tasks (and the task calling into livepatch).  So the
reliability of unwinding from the livepatch tracer function itself
(klp_ftrace_handler) isn't a concern since it doesn't sleep.

To detect FTRACE in a stack trace, add the following to special_functions[]:

	- ftrace_graph_call()
	- ftrace_graph_caller()

Please see the diff for a comment that explains why ftrace_graph_call()
must be checked.

Also, the Function Graph Tracer modifies the return address of a traced
function to a return trampoline (return_to_handler()) to gather tracing
data on function return. Stack traces taken from the traced function and
functions it calls will not show the original caller of the traced function.
The unwinder handles this case by getting the original caller from FTRACE.

However, stack traces taken from the trampoline itself and functions it calls
are unreliable as the original return address may not be available in
that context. This is because the trampoline calls FTRACE to gather trace
data as well as to obtain the actual return address and FTRACE discards the
record of the original return address along the way.
Again, this shouldn't be a concern because livepatch won't be unwinding
from a function_graph trampoline unless it got preempted somehow (and
then the el1 frame would get detected anyway).

Add return_to_handler() to special_functions[].

Check for kretprobe
===================

For functions with a kretprobe set up, probe code executes on entry
to the function and replaces the return address in the stack frame with a
kretprobe trampoline. Whenever the function returns, control is
transferred to the trampoline. The trampoline eventually returns to the
original return address.

A stack trace taken while executing in the function (or in functions that
get called from the function) will not show the original return address.
Similarly, a stack trace taken while executing in the trampoline itself
(and functions that get called from the trampoline) will not show the
original return address. This means that the caller of the probed function
will not show. This makes the stack trace unreliable.

Add the kretprobe trampoline to special_functions[].

FYI, each task contains a task->kretprobe_instances list that can
theoretically be consulted to find the orginal return address. But I am
not entirely sure how to safely traverse that list for stack traces
not on the current process. So, I have taken the easy way out.
For kretprobes, unwinding from the trampoline or kretprobe handler
shouldn't be a reliability concern for live patching, for similar
reasons as above.

Otherwise, when unwinding from a blocked task which has
'kretprobe_trampoline' on the stack, the unwinder needs a way to get the
original return address.  Masami has been working on an interface to
make that possible for x86.  I assume something similar could be done
for arm64.

Optprobes
=========

Optprobes may be implemented in the future for arm64. For optprobes,
the relevant trampoline(s) can be added to special_functions[].
Similar comment here, livepatch doesn't unwind from such trampolines.

-- 
Josh

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help