Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats
From: Daniel Borkmann <daniel@iogearbox.net>
Date: 2020-03-18 20:58:13
Also in:
bpf, linux-fsdevel
On 3/18/20 7:33 AM, Song Liu wrote:
quoted
On Mar 17, 2020, at 4:08 PM, Song Liu [off-list ref] wrote:quoted
On Mar 17, 2020, at 2:47 PM, Daniel Borkmann [off-list ref] wrote:quoted
quoted
Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit programs supersede this old bpf_stats infrastructure? Iow, can't we implement the same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then potentially deprecate bpf_stats counters?I think run_time_ns has its own value as a simple monitoring framework. We can use it in tools like top (and variations). It will be easier for these tools to adopt run_time_ns than using fentry/fexit.Agree that this is easier; I presume there is no such official integration today in tools like top, right, or is there anything planned?Yes, we do want more supports in different tools to increase the visibility. Here is the effort for atop: https://github.com/Atoptool/atop/pull/88 . I wasn't pushing push hard on this one mostly because the sysctl interface requires a user space "owner".quoted
quoted
On the other hand, in long term, we may include a few fentry/fexit based programs in the kernel binary (or the rpm), so that these tools can use them easily. At that time, we can fully deprecate run_time_ns. Maybe this is not too far away?Did you check how feasible it is to have something like `bpftool prog profile top` which then enables fentry/fexit for /all/ existing BPF programs in the system? It could then sort the sample interval by run_cnt, cycles, cache misses, aggregated runtime, etc in a top-like output. Wdyt?I wonder whether we can achieve this with one bpf prog (or a trampoline) that covers all BPF programs, like a trampoline inside __BPF_PROG_RUN()? For long term direction, I think we could compare two different approaches: add new tools (like bpftool prog profile top) vs. add BPF support to existing tools. The first approach is easier. The latter approach would show BPF information to users who are not expecting BPF programs in the systems. For many sysadmins, seeing BPF programs in top/ps, and controlling them via kill is more natural than learning bpftool. What's your thought on this?More thoughts on this. If we have a special trampoline that attach to all BPF programs at once, we really don't need the run_time_ns stats anymore. Eventually, tools that monitor BPF programs will depend on libbpf, so using fentry/fexit to monitor BPF programs doesn't introduce extra dependency. I guess we also need a way to include BPF program in libbpf. To summarize this plan, we need: 1) A global trampoline that attaches to all BPF programs at once;
Overall sounds good, I think the `at once` part might be tricky, at least it would need to patch one prog after another, each prog also needs to store its own metrics somewhere for later collection. The start-to-sample could be a shared global var (aka shared map between all the programs) which would flip the switch though.
2) Embed fentry/fexit program in libbpf, which will be used by tools for monitoring; 3) BPF helpers to read time, which replaces current run_time_ns. Does this look reasonable? Thanks, Song