Re: [PATCH v2 bpf-next] bpf: sharing bpf runtime stats with /dev/bpf_stats
From: Daniel Borkmann <daniel@iogearbox.net>
Date: 2020-03-17 21:47:06
Also in:
linux-fsdevel, netdev
On 3/17/20 9:13 PM, Song Liu wrote:
quoted
On Mar 17, 2020, at 1:03 PM, Daniel Borkmann [off-list ref] wrote: On 3/17/20 8:54 PM, Song Liu wrote:quoted
quoted
On Mar 17, 2020, at 12:30 PM, Daniel Borkmann [off-list ref] wrote: On 3/16/20 9:33 PM, Song Liu wrote:quoted
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats. Typical userspace tools use kernel.bpf_stats_enabled as follows: 1. Enable kernel.bpf_stats_enabled; 2. Check program run_time_ns; 3. Sleep for the monitoring period; 4. Check program run_time_ns again, calculate the difference; 5. Disable kernel.bpf_stats_enabled. The problem with this approach is that only one userspace tool can toggle this sysctl. If multiple tools toggle the sysctl at the same time, the measurement may be inaccurate. To fix this problem while keep backward compatibility, introduce a new bpf command BPF_ENABLE_RUNTIME_STATS. On success, this command enables run_time_ns stats and returns a valid fd. With BPF_ENABLE_RUNTIME_STATS, user space tool would have the following flow: 1. Get a fd with BPF_ENABLE_RUNTIME_STATS, and make sure it is valid; 2. Check program run_time_ns; 3. Sleep for the monitoring period; 4. Check program run_time_ns again, calculate the difference; 5. Close the fd. Signed-off-by: Song Liu <redacted>Hmm, I see no relation to /dev/bpf_stats anymore, yet the subject still talks about it?My fault. Will fix..quoted
Also, should this have bpftool integration now that we have `bpftool prog profile` support? Would be nice to then fetch the related stats via bpf_prog_info, so users can consume this in an easy way.We can add "run_time_ns" as a metric to "bpftool prog profile". But the mechanism is not the same though. Let me think about this.Hm, true as well. Wouldn't long-term extending "bpftool prog profile" fentry/fexit programs supersede this old bpf_stats infrastructure? Iow, can't we implement the same (or even more elaborate stats aggregation) in BPF via fentry/fexit and then potentially deprecate bpf_stats counters?I think run_time_ns has its own value as a simple monitoring framework. We can use it in tools like top (and variations). It will be easier for these tools to adopt run_time_ns than using fentry/fexit.
Agree that this is easier; I presume there is no such official integration today in tools like top, right, or is there anything planned?
On the other hand, in long term, we may include a few fentry/fexit based programs in the kernel binary (or the rpm), so that these tools can use them easily. At that time, we can fully deprecate run_time_ns. Maybe this is not too far away?
Did you check how feasible it is to have something like `bpftool prog profile top` which then enables fentry/fexit for /all/ existing BPF programs in the system? It could then sort the sample interval by run_cnt, cycles, cache misses, aggregated runtime, etc in a top-like output. Wdyt? Thanks, Daniel