Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark
From: Toke Høiland-Jørgensen <hidden>
Date: 2021-11-23 19:19:42
Joanne Koong [off-list ref] writes:
Add benchmark to measure the throughput and latency of the bpf_loop
call.
Testing this on qemu on my dev machine on 1 thread, the data is
as follows:
nr_loops: 1
bpf_loop - throughput: 43.350 ± 0.864 M ops/s, latency: 23.068 ns/op
nr_loops: 10
bpf_loop - throughput: 69.586 ± 1.722 M ops/s, latency: 14.371 ns/op
nr_loops: 100
bpf_loop - throughput: 72.046 ± 1.352 M ops/s, latency: 13.880 ns/op
nr_loops: 500
bpf_loop - throughput: 71.677 ± 1.316 M ops/s, latency: 13.951 ns/op
nr_loops: 1000
bpf_loop - throughput: 69.435 ± 1.219 M ops/s, latency: 14.402 ns/op
nr_loops: 5000
bpf_loop - throughput: 72.624 ± 1.162 M ops/s, latency: 13.770 ns/op
nr_loops: 10000
bpf_loop - throughput: 75.417 ± 1.446 M ops/s, latency: 13.260 ns/op
nr_loops: 50000
bpf_loop - throughput: 77.400 ± 2.214 M ops/s, latency: 12.920 ns/op
nr_loops: 100000
bpf_loop - throughput: 78.636 ± 2.107 M ops/s, latency: 12.717 ns/op
nr_loops: 500000
bpf_loop - throughput: 76.909 ± 2.035 M ops/s, latency: 13.002 ns/op
nr_loops: 1000000
bpf_loop - throughput: 77.636 ± 1.748 M ops/s, latency: 12.881 ns/op
From this data, we can see that the latency per loop decreases as the
number of loops increases. On this particular machine, each loop had an
overhead of about ~13 ns, and we were able to run ~70 million loops
per second.The latency figures are great, thanks! I assume these numbers are with retpolines enabled? Otherwise 12ns seems a bit much... Or is this because of qemu? -Toke