Thread (24 messages) 24 messages, 3 authors, 2023-09-08

Re: [PATCH v4 6/9] tracing/fprobe: Enable fprobe events with CONFIG_DYNAMIC_FTRACE_WITH_ARGS

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2023-08-26 12:22:07
Also in: bpf, lkml

(Cc: Peter)

On Fri, 25 Aug 2023 18:12:07 +0200
Florent Revest [off-list ref] wrote:
On Wed, Aug 23, 2023 at 5:16 PM Masami Hiramatsu (Google)
[off-list ref] wrote:
quoted
diff --git a/kernel/trace/trace_fprobe.c b/kernel/trace/trace_fprobe.c
index c60d0d9f1a95..90ad28260a9f 100644
--- a/kernel/trace/trace_fprobe.c
+++ b/kernel/trace/trace_fprobe.c
+#else /* CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS && !CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */
+
+/* Since fprobe handlers can be nested, pt_regs buffer need to be a stack */
+#define PERF_FPROBE_REGS_MAX   4
+
+struct pt_regs_stack {
+       struct pt_regs regs[PERF_FPROBE_REGS_MAX];
+       int idx;
+};
+
+static DEFINE_PER_CPU(struct pt_regs_stack, perf_fprobe_regs);
+
+static __always_inline
+struct pt_regs *perf_fprobe_partial_regs(struct ftrace_regs *fregs)
+{
+       struct pt_regs_stack *stack = this_cpu_ptr(&perf_fprobe_regs);
+       struct pt_regs *regs;
+
+       if (stack->idx < PERF_FPROBE_REGS_MAX) {
+               regs = stack->regs[stack->idx++];
This is missing an &:
regs = &stack->regs[stack->idx++];
Oops, good point. I'm curious it didin't cause compile error...
(I thought I built it on arm64)
quoted
+               return ftrace_partial_regs(fregs, regs);
I think this is incorrect on arm64 and will likely cause very subtle
failure modes down the line on other architectures too. The problem on
arm64 is that Perf calls "user_mode(regs)" somewhere down the line,
that macro tries to read the "pstate" register, which is not populated
in ftrace_regs, so it's not copied into a "partial" pt_regs either and
Perf can take wrong decisions based on that.
I think we can assure the ftrace_regs is always !user_mode() so in that case
ftrace_partial_regs() should fill the 'pstate' register as kernel mode.
I already mentioned this problem in the past:
- in the third answer block of:
https://lore.kernel.org/all/CABRcYmJjtVq-330ktqTAUiNO1=yG_aHd0xz=c550O5C7QP++UA@mail.gmail.com/ (local)
- in the fourth answer block of:
https://lore.kernel.org/all/CABRcYm+esb8J2O1v6=C+h+HSa5NxraPUgo63w7-iZj0CXbpusg@mail.gmail.com/ (local)
Oops, sorry I missed that. And I basically agreed that we need a special
care for perf. Let me reply it.
It is quite possible that other architectures at some point introduce
a light ftrace "args" trampoline that misses one of the registers
expected by Perf because they don't realize that this trampoline calls
fprobe which calls Perf which has specific registers expectations.
Agreed.
We got the green light from Alexei to use ftrace_partial_regs for "BPF
mutli_kprobe" because these BPF programs can gracefully deal with
sparse pt_regs but I think a similar conversation needs to happen with
the Perf folks.
Indeed. Who is the best person to involve, Peterz? (but I think
we need arm64 PMU part maintainer to talk)
----

On a side-note, a subtle difference between ftrace_partial_regs with
and without HAVE_PT_REGS_TO_FTRACE_REGS_CAST is that one does a copy
and the other does not. If a subsystem receives a partial regs under
HAVE_PT_REGS_TO_FTRACE_REGS_CAST, it can modify register fields and
the modified values will be restored by the ftrace trampoline. Without
HAVE_PT_REGS_TO_FTRACE_REGS_CAST, only the copy will be modified and
ftrace won't restore them. I think the least we can do is to document
thoroughly the guarantees of the ftrace_partial_regs API: users
shouldn't rely on modifying the resulting regs because depending on
the architecture this could do different things. People shouldn't rely
on any register that isn't covered by one of the ftrace_regs_get_*
helpers because it can be unpopulated on some architectures. I believe
this is the case for BPF multi_kprobe but not for Perf.
I agree with the documentation requirement, but since the fprobe official
interface becomes ftrace_regs, user naturally expects it is not pt_regs.
The problem is that the perf's case. Since the perf is natively only
support pt_regs (and there is no reason to support ftrace_regs, yes).
Hmm, I will recheck how the perf events on trace-event is implementd.

Thank you,
quoted
+       }
+       return NULL;
+}
+
+static __always_inline void perf_fprobe_return_regs(struct pt_regs *regs)
+{
+       struct pt_regs_stack *stack = this_cpu_ptr(&perf_fprobe_regs);
+
+       if (WARN_ON_ONCE(regs != stack->regs[stack->idx]))
This is missing an & too:
if (WARN_ON_ONCE(regs != &stack->regs[stack->idx]))



quoted
+               return;
+
+       --stack->idx;
+}
+
+#endif /* !CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS || CONFIG_HAVE_PT_REGS_TO_FTRACE_REGS_CAST */

-- 
Masami Hiramatsu (Google) [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help