Re: [PATCH v4 6/9] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2023-08-26 12:22:07
Also in:
bpf, lkml
(Cc: Peter) On Fri, 25 Aug 2023 18:12:07 +0200 Florent Revest [off-list ref] wrote:
On Wed, Aug 23, 2023 at 5:16 PM Masami Hiramatsu (Google) [off-list ref] wrote:quoted
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c index c60d0d9f1a95..90ad28260a9f 100644 --- a/kernel/trace/trace_fprobe.c +++ b/kernel/trace/trace_fprobe.c +#else /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS && !CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */ + +/* Since fprobe handlers can be nested, pt_regs buffer need to be a stack */ +#define PERF_FPROBE_REGS_MAX 4 + +struct pt_regs_stack { + struct pt_regs regs[PERF_FPROBE_REGS_MAX]; + int idx; +}; + +static DEFINE_PER_CPU(struct pt_regs_stack, perf_fprobe_regs); + +static __always_inline +struct pt_regs *perf_fprobe_partial_regs(struct ftrace_regs *fregs) +{ + struct pt_regs_stack *stack = this_cpu_ptr(&perf_fprobe_regs); + struct pt_regs *regs; + + if (stack->idx < PERF_FPROBE_REGS_MAX) { + regs = stack->regs[stack->idx++];This is missing an &: regs = &stack->regs[stack->idx++];
Oops, good point. I'm curious it didin't cause compile error... (I thought I built it on arm64)
quoted
+ return ftrace_partial_regs(fregs, regs);I think this is incorrect on arm64 and will likely cause very subtle failure modes down the line on other architectures too. The problem on arm64 is that Perf calls "user_mode(regs)" somewhere down the line, that macro tries to read the "pstate" register, which is not populated in ftrace_regs, so it's not copied into a "partial" pt_regs either and Perf can take wrong decisions based on that.
I think we can assure the ftrace_regs is always !user_mode() so in that case ftrace_partial_regs() should fill the 'pstate' register as kernel mode.
I already mentioned this problem in the past: - in the third answer block of: https://lore.kernel.org/all/CABRcYmJjtVq-330ktqTAUiNO1=yG_aHd0xz=c550O5C7QP++UA@mail.gmail.com/ (local) - in the fourth answer block of: https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@mail.gmail.com/ (local)
Oops, sorry I missed that. And I basically agreed that we need a special care for perf. Let me reply it.
It is quite possible that other architectures at some point introduce a light ftrace "args" trampoline that misses one of the registers expected by Perf because they don't realize that this trampoline calls fprobe which calls Perf which has specific registers expectations.
Agreed.
We got the green light from Alexei to use ftrace_partial_regs for "BPF mutli_kprobe" because these BPF programs can gracefully deal with sparse pt_regs but I think a similar conversation needs to happen with the Perf folks.
Indeed. Who is the best person to involve, Peterz? (but I think we need arm64 PMU part maintainer to talk)
---- On a side-note, a subtle difference between ftrace_partial_regs with and without HAVE_PT_REGS_TO_FTRACE_REGS_CAST is that one does a copy and the other does not. If a subsystem receives a partial regs under HAVE_PT_REGS_TO_FTRACE_REGS_CAST, it can modify register fields and the modified values will be restored by the ftrace trampoline. Without HAVE_PT_REGS_TO_FTRACE_REGS_CAST, only the copy will be modified and ftrace won't restore them. I think the least we can do is to document thoroughly the guarantees of the ftrace_partial_regs API: users shouldn't rely on modifying the resulting regs because depending on the architecture this could do different things. People shouldn't rely on any register that isn't covered by one of the ftrace_regs_get_* helpers because it can be unpopulated on some architectures. I believe this is the case for BPF multi_kprobe but not for Perf.
I agree with the documentation requirement, but since the fprobe official interface becomes ftrace_regs, user naturally expects it is not pt_regs. The problem is that the perf's case. Since the perf is natively only support pt_regs (and there is no reason to support ftrace_regs, yes). Hmm, I will recheck how the perf events on trace-event is implementd. Thank you,
quoted
+ } + return NULL; +} + +static __always_inline void perf_fprobe_return_regs(struct pt_regs *regs) +{ + struct pt_regs_stack *stack = this_cpu_ptr(&perf_fprobe_regs); + + if (WARN_ON_ONCE(regs != stack->regs[stack->idx]))This is missing an & too: if (WARN_ON_ONCE(regs != &stack->regs[stack->idx]))quoted
+ return; + + --stack->idx; +} + +#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
-- Masami Hiramatsu (Google) [off-list ref]