Thread (50 messages) 50 messages, 6 authors, 2019-06-18

Re: [RFC PATCH 00/11] bpf, trace, dtrace: DTrace BPF program type implementation and sample use

From: Kris Van Hees <hidden>
Date: 2019-06-18 01:55:51
Also in: bpf, lkml

On Mon, Jun 17, 2019 at 06:32:22PM -0700, Alexei Starovoitov wrote:
On Mon, Jun 17, 2019 at 6:25 PM Kris Van Hees [off-list ref] wrote:
quoted
On Thu, May 23, 2019 at 01:28:44PM -0700, Alexei Starovoitov wrote:

<< stuff skipped because it is not relevant to the technical discussion... >>
quoted
quoted
quoted
In particular you brought up a good point that there is a use case
for sharing a piece of bpf program between kprobe and tracepoint events.
The better way to do that is via bpf2bpf call.
Example:
void bpf_subprog(arbitrary args)
{
}

SEC("kprobe/__set_task_comm")
int bpf_prog_kprobe(struct pt_regs *ctx)
{
  bpf_subprog(...);
}

SEC("tracepoint/sched/sched_switch")
int bpf_prog_tracepoint(struct sched_switch_args *ctx)
{
  bpf_subprog(...);
}

Such configuration is not supported by the verifier yet.
We've been discussing it for some time, but no work has started,
since there was no concrete use case.
If you can work on adding support for it everyone will benefit.

Could you please consider doing that as a step forward?
This definitely looks to be an interesting addition and I am happy to look into
that further.  I have a few questions that I hope you can shed light on...

1. What context would bpf_subprog execute with?  If it can be called from
   multiple different prog types, would it see whichever context the caller
   is executing with?  Or would you envision bpf_subprog to not be allowed to
   access the execution context because it cannot know which one is in use?
bpf_subprog() won't be able to access 'ctx' pointer _if_ it's ambiguous.
The verifier already smart enough to track all the data flow, so it's fine to
pass 'struct pt_regs *ctx' as long as it's accessed safely.
For example:
void bpf_subprog(int kind, struct pt_regs *ctx1, struct sched_switch_args *ctx2)
{
  if (kind == 1)
     bpf_printk("%d", ctx1->pc);
  if (kind == 2)
     bpf_printk("%d", ctx2->next_pid);
}

SEC("kprobe/__set_task_comm")
int bpf_prog_kprobe(struct pt_regs *ctx)
{
  bpf_subprog(1, ctx, NULL);
}

SEC("tracepoint/sched/sched_switch")
int bpf_prog_tracepoint(struct sched_switch_args *ctx)
{
  bpf_subprog(2, NULL, ctx);
}

The verifier should be able to prove that the above is correct.
It can do so already if s/ctx1/map_value1/, s/ctx2/map_value2/
What's missing is an ability to have more than one 'starting' or 'root caller'
program.

Now replace SEC("tracepoint/sched/sched_switch") with SEC("cgroup/ingress")
and it's becoming clear that BPF_PROG_TYPE_PROBE approach is not good enough, right?
Folks are already sharing the bpf progs between kprobe and networking.
Currently it's done via code duplication and actual sharing happens via maps.
That's not ideal, hence we've been discussing 'shared library' approach for
quite some time. We need a way to support common bpf functions that can be called
from networking and from tracing programs.
quoted
2. Given that BPF programs are loaded with a specification of the prog type,
   how would one load a code construct as the one you outline above?  How can
   you load a BPF function and have it be used as subprog from programs that
   are loaded separately?  I.e. in the sample above, if bpf_subprog is loaded
   as part of loading bpf_prog_kprobe (prog type KPROBE), how can it be
   referenced from bpf_prog_tracepoint (prog type TRACEPOINT) which would be
   loaded separately?
The api to support shared libraries was discussed, but not yet implemented.
We've discussed 'FD + name' approach.
FD identifies a loaded program (which is root program + a set of subprogs)
and other programs can be loaded at any time later. The BPF_CALL instructions
in such later program would refer to older subprogs via FD + name.
Note that both tracing and networking progs can be part of single elf file.
libbpf has to be smart to load progs into kernel step by step
and reusing subprogs that are already loaded.

Note that libbpf work for such feature can begin _without_ kernel changes.
libbpf can pass bpf_prog_kprobe+bpf_subprog as a single program first,
then pass bpf_prog_tracepoint+bpf_subprog second (as a separate program).
The bpf_subprog will be duplicated and JITed twice, but sharing will happen
because data structures (maps, global and static data) will be shared.
This way the support for 'pseudo shared libraries' can begin.
(later accompanied by FD+name kernel support)
As far as I can determine, the current libbpd implementation is already able
to do the duplication of the called function, even when the ELF object contains
programs of differemt program types.  I.e. the example you give at the top
of the email actually seems to work already.  Right?
Have you tried it?
Yes, of course.  I wouldn't want to make an unfounded claim.
quoted
In that case, I am a bit unsure what more can be done on the side of libbpf
without needing kernel changes?
it's a bit weird to discuss hypothetical kernel changes when the first step
of changing libbpf wasn't even attempted.
It is not hypothetical.  The folowing example works fine:

static int noinline bpf_action(void *ctx, long fd, long buf, long count)
{
        int                     cpu = bpf_get_smp_processor_id();
        struct data {
                u64     arg0;
                u64     arg1;
                u64     arg2;
        }                       rec;

        memset(&rec, 0, sizeof(rec));

        rec.arg0 = fd;
        rec.arg1 = buf;
        rec.arg2 = count;

        bpf_perf_event_output(ctx, &buffers, cpu, &rec, sizeof(rec));

        return 0;
}

SEC("kprobe/ksys_write")
int bpf_kprobe(struct pt_regs *ctx)
{
        return bpf_action(ctx, ctx->di, ctx->si, ctx->dx);
}

SEC("tracepoint/syscalls/sys_enter_write")
int bpf_tp(struct syscalls_enter_write_args *ctx)
{
        return bpf_action(ctx, ctx->fd, ctx->buf, ctx->count);
}

char _license[] SEC("license") = "GPL";
u32 _version SEC("version") = LINUX_VERSION_CODE;
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help