[RFC PATCH 0/4] tracing/probes: Optimize fetcharg with BPF
From: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Date: 2026-07-01 13:45:26
Also in:
bpf, linux-kselftest, lkml
Hi,
I investigated the feasibility of optimizing `fetcharg` in probe events
using BPF conversion. The result looks promising. It can reduce about
30% of overhead (and maybe more if we have more than 3 arguments.)
I actually thought there was not such a big difference because I guessed
major overhead source is unsafe pointer dereferencing (e.g.
copy_from_kernel_nofault()). Actually without CONFIG_BPF_JIT, the overhead
is more than double. But with the JIT compiler it showed better performance.
The basic concept is quite simple. The process remains the same up until
the point where user input is converted into `fetcharg` code. It is
possible to convert some of the fundamental `fetcharg` operations into
an equivalent sequence of BPF instructions. This creates a single
`bpf_prog` for each probe event (rather than one per argument).
This program executes within the event handler, reads `pt_regs` directly,
and stores the results in the ftrace ring buffer, just as `fetcharg`
does.
So here are the benchmark results on qemu (KVM) on Intel Core i7-8565U.
When enabling BPF with JIT:
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 298882359 - - - loops/sec
- - - - overhead
Kprobe 9740841 8664195 7944956 7608274 loops/sec
99.31 ns 12.76 ns 23.21 ns 28.78 ns overhead
Fprobe 10827749 9220918 7992512 7683757 loops/sec
89.01 ns 16.09 ns 32.76 ns 37.79 ns overhead
Eprobe 6746389 6245994 5319037 4845406 loops/sec
144.88 ns 11.88 ns 39.78 ns 58.15 ns overhead
--------------------------------------------------------------------------------
When enabling BPF without JIT:
-----------------------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
-----------------------------------------------------------------------------------------------
Baseline 84067374 - - - loops/sec
- - - - overhead
Kprobe 7092949 5834913 3848776 3443408 loops/sec
129.09 ns 30.40 ns 118.84 ns 149.42 ns overhead
Fprobe 9426302 6441734 4350313 3710814 loops/sec
94.19 ns 49.15 ns 123.78 ns 163.40 ns overhead
Eprobe 5681716 4958113 3940999 3953434 loops/sec
164.11 ns 25.69 ns 77.74 ns 76.94 ns overhead
-----------------------------------------------------------------------------------------------
When disabling BPF (legacy fetcharg)
--------------------------------------------------------------------------------
Configuration 0 Fetchargs 1 Fetcharg 2 Fetchargs 3 Fetchargs
--------------------------------------------------------------------------------
Baseline 245433525 - - - loops/sec
- - - - overhead
Kprobe 9055348 8488351 7219595 6453928 loops/sec
106.36 ns 7.38 ns 28.08 ns 44.51 ns overhead
Fprobe 10859326 9288801 7492518 6607046 loops/sec
88.01 ns 15.57 ns 41.38 ns 59.27 ns overhead
Eprobe 6987128 5114526 5055084 4803759 loops/sec
139.05 ns 52.40 ns 54.70 ns 65.05 ns overhead
--------------------------------------------------------------------------------
The number is still unstable (because of the benchmark problem) but the
trend shows the BPF+JIT is the winner.
TODOs:
- Add a new Kconfig which depends on CONFIG_BPF_JIT=y.
- Even if a single dereference operation fails, processing of subsequent
arguments continues.
- Allow mixing with unsupported FETCH_OPs on the same event.
Thank you,
---
base-commit: c0c56fe6fb52cfb28419242cfa6235125f818f94
Masami Hiramatsu (Google) (4):
tools/tracing: Add fetcharg performance micro-benchmark
tracing/probes: Compile all fetchargs into a single BPF program per event
tracing: Add disable_bpf trace option to ignore eBPF for fetchargs
selftests/ftrace: Add a test for eBPF compiled fetchargs
kernel/trace/trace.c | 7 +
kernel/trace/trace.h | 8 +
kernel/trace/trace_probe.c | 249 ++++++++++++++++++++
kernel/trace/trace_probe.h | 15 +
kernel/trace/trace_probe_tmpl.h | 13 +
.../ftrace/test.d/dynevent/test_bpf_fetchargs.tc | 51 ++++
tools/tracing/benchmark/Kbuild | 3
tools/tracing/benchmark/Makefile | 12 +
tools/tracing/benchmark/bench_fetcharg.sh | 195 ++++++++++++++++
tools/tracing/benchmark/fetcharg_bench.c | 98 ++++++++
tools/tracing/benchmark/fetcharg_bench_trace.h | 37 +++
11 files changed, 684 insertions(+), 4 deletions(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/dynevent/test_bpf_fetchargs.tc
create mode 100644 tools/tracing/benchmark/Kbuild
create mode 100644 tools/tracing/benchmark/Makefile
create mode 100755 tools/tracing/benchmark/bench_fetcharg.sh
create mode 100644 tools/tracing/benchmark/fetcharg_bench.c
create mode 100644 tools/tracing/benchmark/fetcharg_bench_trace.h
--
Masami Hiramatsu (Google) [off-list ref]