Re: [PATCH bpf-next 06/29] bpf: Add bpf_arg/bpf_ret_value helpers for... | bpf

Re: [PATCH bpf-next 06/29] bpf: Add bpf_arg/bpf_ret_value helpers for tracing programs

From: Jiri Olsa <hidden>
Date: 2021-11-28 18:08:17
Also in: netdev

On Wed, Nov 24, 2021 at 01:43:22PM -0800, Andrii Nakryiko wrote:

On Thu, Nov 18, 2021 at 3:25 AM Jiri Olsa [off-list ref] wrote:

quoted

Adding bpf_arg/bpf_ret_value helpers for tracing programs
that returns traced function arguments.

Get n-th argument of the traced function:
  long bpf_arg(void *ctx, int n)

Get return value of the traced function:
  long bpf_ret_value(void *ctx)

The trampoline now stores number of arguments on ctx-8
address, so it's easy to verify argument index and find
return value argument.

Moving function ip address on the trampoline stack behind
the number of functions arguments, so it's now stored
on ctx-16 address.

Both helpers are inlined by verifier.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
---

It would be great to land these changes separate from your huge patch
set. There are some upcoming BPF trampoline related changes that will
touch this (to add BPF cookie support for fentry/fexit progs), so
would be nice to minimize the interdependencies. So maybe post this
patch separately (probably after holidays ;) ).

ok

quoted

 arch/x86/net/bpf_jit_comp.c    | 18 +++++++++++---
 include/uapi/linux/bpf.h       | 14 +++++++++++
 kernel/bpf/verifier.c          | 45 ++++++++++++++++++++++++++++++++--
 kernel/trace/bpf_trace.c       | 38 +++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h | 14 +++++++++++
 5 files changed, 122 insertions(+), 7 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 631847907786..67e8ac9aaf0d 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c

@@ -1941,7 +1941,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                                void *orig_call)
 {
        int ret, i, nr_args = m->nr_args;
-       int stack_size = nr_args * 8;
+       int stack_size = nr_args * 8 + 8 /* nr_args */;

this /* nr_args */ next to 8 is super confusing, would be better to
expand the comment; might be a good idea to have some sort of a
description of possible stack layouts (e.g., fexit has some extra
stuff on the stack, I think, but it's impossible to remember and need
to recover that knowledge from the assembly code, basically).

quoted

        struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
        struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
        struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];

@@ -1987,12 +1987,22 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
                EMIT4(0x48, 0x83, 0xe8, X86_PATCH_SIZE);
                emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size);

-               /* Continue with stack_size for regs storage, stack will
-                * be correctly restored with 'leave' instruction.
-                */
+               /* Continue with stack_size for 'nr_args' storage */

same, I don't think this comment really helps, just confuses some more

ok, I'll put some more comments with list of possible the stack layouts

quoted

                stack_size -= 8;
        }

+       /* Store number of arguments of the traced function:
+        *   mov rax, nr_args
+        *   mov QWORD PTR [rbp - stack_size], rax
+        */
+       emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args);
+       emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size);
+
+       /* Continue with stack_size for regs storage, stack will
+        * be correctly restored with 'leave' instruction.
+        */
+       stack_size -= 8;

I think "stack_size" as a name outlived itself and it just makes
everything harder to understand. It's used more like a stack offset
(relative to rsp or rbp) for different things. Would it make code
worse if we had few offset variables instead (or rather in addition,
we still need to calculate a full stack_size; it's just it's constant
re-adjustment is what's hard to keep track of), like regs_off,
ret_ip_off, arg_cnt_off, etc?

let's see, I'll try that

quoted

+
        save_regs(m, &prog, nr_args, stack_size);

        if (flags & BPF_TRAMP_F_CALL_ORIG) {

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a69e4b04ffeb..fc8b344eecba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h

@@ -4957,6 +4957,18 @@ union bpf_attr {
  *             **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
  *             **-EBUSY** if failed to try lock mmap_lock.
  *             **-EINVAL** for invalid **flags**.
+ *
+ * long bpf_arg(void *ctx, int n)

__u32 n ?

ok

quoted

+ *     Description
+ *             Get n-th argument of the traced function (for tracing programs).
+ *     Return
+ *             Value of the argument.

What about errors? those need to be documented.

ok

quoted

+ *
+ * long bpf_ret_value(void *ctx)
+ *     Description
+ *             Get return value of the traced function (for tracing programs).
+ *     Return
+ *             Return value of the traced function.

Same, errors not documented. Also would be good to document what
happens when ret_value is requested in the context where there is no
ret value (e.g., fentry)

ugh, that's not handled at the moment.. should we fail when
we see bpf_ret_value helper call in fentry program?

quoted

  */
 #define __BPF_FUNC_MAPPER(FN)          \
        FN(unspec),                     \

@@ -5140,6 +5152,8 @@ union bpf_attr {
        FN(skc_to_unix_sock),           \
        FN(kallsyms_lookup_name),       \
        FN(find_vma),                   \
+       FN(arg),                        \
+       FN(ret_value),                  \

We already have bpf_get_func_ip, so why not continue a tradition and
call these bpf_get_func_arg() and bpf_get_func_ret(). Nice, short,
clean, consistent.

ok

BTW, a wild thought. Wouldn't it be cool to have these functions work
with kprobe/kretprobe as well? Do you think it's possible?

right, bpf_get_func_ip already works for kprobes

struct kprobe could have the btf_func_model of the traced function,
so in case we trace function directly on the entry point we could
read arguments registers based on the btf_func_model

I'll check with Massami

quoted

        /* */

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index fac0c3518add..d4249ef6ca7e 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c

@@ -13246,11 +13246,52 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
                        continue;
                }

+               /* Implement bpf_arg inline. */
+               if (prog_type == BPF_PROG_TYPE_TRACING &&
+                   insn->imm == BPF_FUNC_arg) {
+                       /* Load nr_args from ctx - 8 */
+                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
+                       insn_buf[1] = BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4);
+                       insn_buf[2] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
+                       insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
+                       insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
+                       insn_buf[5] = BPF_JMP_A(1);
+                       insn_buf[6] = BPF_MOV64_IMM(BPF_REG_0, 0);
+
+                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 7);
+                       if (!new_prog)
+                               return -ENOMEM;
+
+                       delta    += 6;
+                       env->prog = prog = new_prog;
+                       insn      = new_prog->insnsi + i + delta;
+                       continue;

nit: this whole sequence of steps and calculations seems like
something that might be abstracted and hidden behind a macro or helper
func? Not related to your change, though. But wouldn't it be easier to
understand if it was just written as:

PATCH_INSNS(
    BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
    BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4);
    BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
    BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
    BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
    BPF_JMP_A(1);
    BPF_MOV64_IMM(BPF_REG_0, 0));
continue;

yep, looks better ;-) I'll check

quoted

+               }
+
+               /* Implement bpf_ret_value inline. */
+               if (prog_type == BPF_PROG_TYPE_TRACING &&
+                   insn->imm == BPF_FUNC_ret_value) {
+                       /* Load nr_args from ctx - 8 */
+                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -8);
+                       insn_buf[1] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
+                       insn_buf[2] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
+                       insn_buf[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
+
+                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 4);
+                       if (!new_prog)
+                               return -ENOMEM;
+
+                       delta    += 3;
+                       env->prog = prog = new_prog;
+                       insn      = new_prog->insnsi + i + delta;
+                       continue;
+               }
+
                /* Implement bpf_get_func_ip inline. */
                if (prog_type == BPF_PROG_TYPE_TRACING &&
                    insn->imm == BPF_FUNC_get_func_ip) {
-                       /* Load IP address from ctx - 8 */
-                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
+                       /* Load IP address from ctx - 16 */
+                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -16);

                        new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
                        if (!new_prog)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 25ea521fb8f1..3844cfb45490 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c

@@ -1012,7 +1012,7 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
 BPF_CALL_1(bpf_get_func_ip_tracing, void *, ctx)
 {
        /* This helper call is inlined by verifier. */
-       return ((u64 *)ctx)[-1];
+       return ((u64 *)ctx)[-2];
 }

 static const struct bpf_func_proto bpf_get_func_ip_proto_tracing = {

@@ -1091,6 +1091,38 @@ static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {
        .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
 };

+BPF_CALL_2(bpf_arg, void *, ctx, int, n)
+{
+       /* This helper call is inlined by verifier. */
+       u64 nr_args = ((u64 *)ctx)[-1];
+
+       if ((u64) n >= nr_args)
+               return 0;

We'll need bpf_get_func_arg_cnt() helper as well to be able to know
the actual number of arguments traced function has. It's impossible to
know whether the argument is zero or there is no argument, otherwise.

my idea was that the program will call those helpers based
on get_func_ip with proper argument indexes

but with bpf_get_func_arg_cnt we could make a simple program
that would just print function with all its arguments easily, ok ;-)

quoted

+       return ((u64 *)ctx)[n];
+}
+
+static const struct bpf_func_proto bpf_arg_proto = {
+       .func           = bpf_arg,
+       .gpl_only       = true,
+       .ret_type       = RET_INTEGER,
+       .arg1_type      = ARG_PTR_TO_CTX,
+       .arg1_type      = ARG_ANYTHING,
+};
+
+BPF_CALL_1(bpf_ret_value, void *, ctx)
+{
+       /* This helper call is inlined by verifier. */
+       u64 nr_args = ((u64 *)ctx)[-1];
+
+       return ((u64 *)ctx)[nr_args];

we should return 0 for fentry or disable this helper for anything but
fexit? It's going to return garbage otherwise.

disabling seems like right choice to me

thanks,
jirka

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help