Thread (8 messages) 8 messages, 3 authors, 2021-02-10

Re: [PATCH 4/4] perf tools: determine if LR is the return address

From: James Clark <hidden>
Date: 2021-01-26 20:32:38
Also in: lkml


On 24/01/2021 02:05, Jiri Olsa wrote:
On Fri, Jan 22, 2021 at 04:18:54PM +0000, Alexandre Truong wrote:
quoted
On arm64 and frame pointer mode (e.g: perf record --callgraph fp),
use dwarf unwind info to check if the link register is the return
address in order to inject it to the frame pointer stack.

Write the following application:

	int a = 10;

	void f2(void)
	{
		for (int i = 0; i < 1000000; i++)
			a *= a;
	}

	void f1()
	{
		f2();
	}

	int main (void)
	{
		f1();
		return 0;
	}

with the following compilation flags:
	gcc -g -fno-omit-frame-pointer -fno-inline -O1

The compiler omits the frame pointer for f2 on arm. This is a problem
with any leaf call, for example an application with many different
calls to malloc() would always omit the calling frame, even if it
can be determined.

	./perf record --call-graph fp ./a.out
	./perf report

currently gives the following stack:

0xffffea52f361
_start
__libc_start_main
main
f2
reproduced on x86 as well
quoted
+static bool get_leaf_frame_caller_enabled(struct perf_sample *sample)
+{
+	return callchain_param.record_mode != CALLCHAIN_FP || !sample->user_regs.regs
+		|| sample->user_regs.mask != PERF_REGS_MASK;
+}
+
+static int add_entry(struct unwind_entry *entry, void *arg)
+{
+	struct entries *entries = arg;
+
+	entries->stack[entries->i++] = entry->ip;
+	return 0;
+}
+
+u64 get_leaf_frame_caller_aarch64(struct perf_sample *sample, struct thread *thread)
+{
+	u64 leaf_frame;
+	struct entries entries = {{0, 0}, 0};
+
+	if (get_leaf_frame_caller_enabled(sample))
the name suggest you'd want to continue if it's true
quoted
+		return 0;
+
+	unwind__get_entries(add_entry, &entries, thread, sample, 2);
I'm scratching my head how this unwinds anything, you enabled just
registers, not the stack right? so the unwind code would do just
IP -> LR + 1 shift?
I think the idea about using libunwind is that the LR might not
be a valid return address. It could be used as a general purpose
register, or just not used at all.

Libunwind should be able to use the dwarf present in the binary to
unwind one frame, as long as nothing stored in the stack is needed.

But now I look at the disassembly for this example, I see that f2()
just has a single 'b' instruction, and not 'bl' so the link register
won't be set. And also 'f1' does store a few things on the stack.
Whether these are needed or not to unwind one frame I'm not sure.

It could be that libunwind is falling back to a frame pointer unwind
mode, which we don't want.

I think it needs further investigation.


James
thanks,
jirka
quoted
+	leaf_frame = callchain_param.order == ORDER_CALLER ?
+		entries.stack[0] : entries.stack[1];
+
+	if (leaf_frame + 1 == sample->user_regs.regs[PERF_REG_ARM64_LR])
+		return sample->user_regs.regs[PERF_REG_ARM64_LR];
+	return 0;
+}
SNIP
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help