Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for... | bpf

[PATCH bpf-next v4 00/27] x86/ftrace/bpf: Add batch support for direct/tracing attach · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 01/27] x86/ftrace: Remove extra orig rax move · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 02/27] x86/ftrace: Remove fault protection code in prepare_ftrace_return · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 03/27] x86/ftrace: Make function graph use ftrace directly · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 04/27] tracing: Add trampoline/graph selftest · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 05/27] ftrace: Add ftrace_add_rec_direct function · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 06/27] ftrace: Add multi direct register/unregister interface · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 07/27] ftrace: Add multi direct modify interface · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 08/27] ftrace/samples: Add multi direct interface test module · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 09/27] bpf: Add support to load multi func tracing program · Jiri Olsa <hidden> · 2021-08-26
Re: [PATCH bpf-next v4 09/27] bpf: Add support to load multi func tracing program · Andrii Nakryiko <hidden> · 2021-08-31
Re: [PATCH bpf-next v4 09/27] bpf: Add support to load multi func tracing program · Jiri Olsa <hidden> · 2021-09-01
[PATCH bpf-next v4 10/27] bpf: Add struct bpf_tramp_node layer · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 11/27] bpf: Factor out bpf_trampoline_init function · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 12/27] bpf: Factor out __bpf_trampoline_lookup function · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 13/27] bpf: Factor out __bpf_trampoline_put function · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 14/27] bpf: Change bpf_trampoline_get to return error pointer · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 15/27] bpf, x64: Allow to use caller address from stack · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 16/27] bpf: Add bpf_trampoline_multi_get/put functions · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 17/27] bpf: Add multi trampoline attach support · Jiri Olsa <hidden> · 2021-08-26
Re: [PATCH bpf-next v4 17/27] bpf: Add multi trampoline attach support · Andrii Nakryiko <hidden> · 2021-08-31
Re: [PATCH bpf-next v4 17/27] bpf: Add multi trampoline attach support · Andrii Nakryiko <hidden> · 2021-09-01
Re: [PATCH bpf-next v4 17/27] bpf: Add multi trampoline attach support · Jiri Olsa <hidden> · 2021-09-01
[PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Jiri Olsa <hidden> · 2021-08-26
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Andrii Nakryiko <hidden> · 2021-08-31
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Jiri Olsa <hidden> · 2021-09-01
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Andrii Nakryiko <hidden> · 2021-09-02
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Jiri Olsa <hidden> · 2021-09-02
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Andrii Nakryiko <hidden> · 2021-09-02
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Alexei Starovoitov <hidden> · 2021-09-02
Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs · Jiri Olsa <hidden> · 2021-09-03
[PATCH bpf-next v4 19/27] bpf: Attach multi trampoline with ftrace_ops · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 20/27] libbpf: Add btf__find_by_glob_kind function · Jiri Olsa <hidden> · 2021-08-26
Re: [PATCH bpf-next v4 20/27] libbpf: Add btf__find_by_glob_kind function · Andrii Nakryiko <hidden> · 2021-09-01
Re: [PATCH bpf-next v4 20/27] libbpf: Add btf__find_by_glob_kind function · Jiri Olsa <hidden> · 2021-09-01
[PATCH bpf-next v4 21/27] libbpf: Add support to link multi func tracing program · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 22/27] selftests/bpf: Add fentry multi func test · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 23/27] selftests/bpf: Add fexit multi func test · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 25/27] selftests/bpf: Add mixed multi func test · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 26/27] selftests/bpf: Add attach multi func test · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 27/27] selftests/bpf: Add ret_mod multi func test · Jiri Olsa <hidden> · 2021-08-26
[PATCH bpf-next v4 24/27] selftests/bpf: Add fentry/fexit multi func test · Jiri Olsa <hidden> · 2021-08-26
Re: [PATCH bpf-next v4 00/27] x86/ftrace/bpf: Add batch support for direct/tracing attach · Alexei Starovoitov <hidden> · 2021-08-29
Re: [PATCH bpf-next v4 00/27] x86/ftrace/bpf: Add batch support for direct/tracing attach · Jiri Olsa <hidden> · 2021-08-30

Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs

From: Jiri Olsa <hidden>
Date: 2021-09-01 15:15:24
Also in: netdev

On Tue, Aug 31, 2021 at 04:51:18PM -0700, Andrii Nakryiko wrote:

On Thu, Aug 26, 2021 at 12:41 PM Jiri Olsa [off-list ref] wrote:

quoted

When we have multi func program attached, the trampoline
switched to the function model of the multi func program.

This breaks already attached standard programs, for example
when we attach following program:

  SEC("fexit/bpf_fentry_test2")
  int BPF_PROG(test1, int a, __u64 b, int ret)

the trampoline pushes on stack args 'a' and 'b' and return
value 'ret'.

When following multi func program is attached to bpf_fentry_test2:

  SEC("fexit.multi/bpf_fentry_test*")
  int BPF_PROG(test2, __u64 a, __u64 b, __u64 c, __u64 d,
                       __u64 e, __u64 f, int ret)

the trampoline takes this program model and pushes all 6 args
and return value on stack.

But we still have the original 'test1' program attached, that
expects 'ret' value where there's 'c' argument now:

  test1(a, b, c)

To fix that we simply overwrite 'c' argument with 'ret' value,
so test1 is called as expected and test2 gets called as:

  test2(a, b, ret, d, e, f, ret)

which is ok, because 'c' is not defined for bpf_fentry_test2
anyway.

What if we change the order on the stack to be the return value first,
followed by input arguments. That would get us a bit closer to
unifying multi-trampoline and the normal one, right? BPF verifier
should be able to rewrite access to the last argument (i.e., return
value) for fexit programs to actually be at offset 0, and shift all
other arguments by 8 bytes. For fentry, if that helps to keep things
more aligned, we'd just skip the first 8 bytes on the stack and store
all the input arguments in the same offsets. So BPF verifier rewriting
logic stays consistent (except offset 0 will be disallowed).

nice idea, with this in place we could cut that args re-arranging code

Basically, I'm thinking how we can make normal and multi trampolines
more interoperable to remove those limitations that two
multi-trampolines can't be attached to the same function, which seems
like a pretty annoying limitation which will be easy to hit in
practice. Alexei previously proposed (as an optimization) to group all
to-be-attached functions into groups by number of arguments, so that
we can have up to 6 different trampolines tailored to actual functions
being attached. So that we don't save unnecessary extra input
arguments saving, which will be even more important once we allow more
than 6 arguments in the future.

With such logic, we should be able to split all the functions into
multiple underlying trampolines, so it seems like it should be
possible to also allow multiple multi-fentry programs to be attached
to the same function by having a separate bpf_trampoline just for
those functions. It will be just an extension of the above "just 6
trampolines" strategy to "as much as we need trampolines".

I'm probably missing something here.. say we have 2 functions with single
argument:

  foo1(int a)
  foo2(int b)

then having 2 programs:

  A - attaching to foo1
  B - attaching to foo2

then you need to have 2 different trampolines instead of single 'generic-1-argument-trampoline'

It's just a vague idea, sorry, I don't understand all the code yet.
But the limitation outlined in one of the previous patches seems very
limiting and unpleasant. I can totally see that some 24/7 running BPF
tracing app uses multi-fentry for tracing a small subset of kernel
functions non-stop, and then someone is trying to use bpftrace or
retsnoop to trace overlapping set of functions. And it immediately
fails. Very frustrating.

so the current approach is to some extent driven by the direct ftrace
batch API:

  you have ftrace_ops object and set it up with functions you want
  to change with calling:

  ftrace_set_filter_ip(ops, ip1);
  ftrace_set_filter_ip(ops, ip2);
  ...

and then register trampoline with those functions:

  register_ftrace_direct_multi(ops, tramp_addr);

and with this call being the expensive one (it does the actual work
and sync waiting), my objective was to call it just once for update

now with 2 intersecting multi trampolines we end up with 3 functions
sets:

  A - functions for first multi trampoline
  B - functions for second multi trampoline
  C - intersection of them

each set needs different trampoline:

  tramp A - calls program for first multi trampoline
  tramp B - calls program for second multi trampoline
  tramp C - calls both programs

so we need to call register_ftrace_direct_multi 3 times

if we allow also standard trampolines being attached, it makes
it even more complicated and ultimatelly gets broken to
1-function/1-trampoline pairs, ending up with attach speed
that we have now

...

I have test code for ftrace direct interface that would
allow to register/change separate function/addr pairs,
so in one call you can change multiple ips each to
different tramp addresss

but even with that, I ended up with lot of new complexity
on bpf side keeping track of multi trampolines intersections,
so I thought I'd start with something limited and simpler

perhaps I should move back to that approach and see how bad
it ends ;-)

or this could be next step on top of current work, that should
get simpler with the args re-arranging you proposed

jirka

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help