Thread (57 messages) 57 messages, 5 authors, 2024-09-10

Re: [PATCH v2] tracing/uprobe: Add missing PID filter for uretprobe

From: Jiri Olsa <hidden>
Date: 2024-09-06 10:43:58
Also in: bpf

On Mon, Sep 02, 2024 at 03:22:25AM +0800, Tianyi Liu wrote:
On Fri, Aug 30, 2024 at 18:12:41PM +0800, Oleg Nesterov wrote:
quoted
The whole discussion was very confusing (yes, I too contributed to the
confusion ;), let me try to summarise.
quoted
U(ret)probes are designed to be filterable using the PID, which is the
second parameter in the perf_event_open syscall. Currently, uprobe works
well with the filtering, but uretprobe is not affected by it.
And this is correct. But the CONFIG_BPF_EVENTS code in __uprobe_perf_func()
misunderstands the purpose of uprobe_perf_filter().

Lets forget about BPF for the moment. It is not that uprobe_perf_filter()
does the filtering by the PID, it doesn't. We can simply kill this function
and perf will work correctly. The perf layer in __uprobe_perf_func() does
the filtering when perf_event->hw.target != NULL.

So why does uprobe_perf_filter() call uprobe_perf_filter()? Not to avoid
the __uprobe_perf_func() call (as the BPF code assumes), but to trigger
unapply_uprobe() in handler_chain().

Suppose you do, say,

	$ perf probe -x /path/to/libc some_hot_function
or
	$ perf probe -x /path/to/libc some_hot_function%return
then
	$perf record -e ... -p 1

to trace the usage of some_hot_function() in the init process. Everything
will work just fine if we kill uprobe_perf_filter()->uprobe_perf_filter().

But. If INIT forks a child C, dup_mm() will copy int3 installed by perf.
So the child C will hit this breakpoint and cal handle_swbp/etc for no
reason every time it calls some_hot_function(), not good.

That is why uprobe_perf_func() calls uprobe_perf_filter() which returns
UPROBE_HANDLER_REMOVE when C hits the breakpoint. handler_chain() will
call unapply_uprobe() which will remove this breakpoint from C->mm.
quoted
We found that the filter function was not invoked when uretprobe was
initially implemented, and this has been existing for ten years.
See above, this is correct.

Note also that if you only use perf-probe + perf-record, no matter how
many instances, you can even add BUG_ON(!uprobe_perf_filter(...)) into
uretprobe_perf_func(). IIRC, perf doesn't use create_local_trace_uprobe().
Thanks for the detailed explanation above, I can understand the code now. 
Yes, I completely misunderstood the purpose of uprobe_perf_filter, 
sorry for the confusion.
quoted
---------------------------------------------------------------------------
Now lets return to BPF and this particular problem. I won't really argue
with this patch, but

	- Please change the subject and update the changelog,
	  "Fixes: c1ae5c75e103" and the whole reasoning is misleading
	  and wrong, IMO.

	- This patch won't fix all problems because uprobe_perf_filter()
	  filters by mm, not by pid. The changelog/patch assumes that it
	  is a "PID filter", but it is not.

	  See https://lore.kernel.org/linux-trace-kernel/20240825224018.GD3906@redhat.com/ (local)
	  If the traced process does clone(CLONE_VM), bpftrace will hit the
	  similar problem, with uprobe or uretprobe.

	- So I still think that the "right" fix should change the
	  bpf_prog_run_array_uprobe() paths somehow, but I know nothing
	  about bpf.
I agree that this patch does not address the issue correctly. 
The PID filter should be implemented within bpf_prog_run_array_uprobe, 
or alternatively, bpf_prog_run_array_uprobe should be called after 
perf_tp_event_match to reuse the filtering mechanism provided by perf.

Also, uretprobe may need UPROBE_HANDLER_REMOVE, similar to uprobe.

For now, please forget the original patch as we need a new solution ;)
hi,
any chance we could go with your fix until we find better solution?

it's simple and it fixes most of the cases for return uprobe pid filter
for events with bpf programs.. I know during the discussion we found
that standard perf record path won't work if there's bpf program
attached on the same event, but I think that likely it needs more
changes and also it's not a common use case

would you consider sending another version addressing Oleg's points
for changelog above?

thanks,
jirka
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help