Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark

[PATCH v2 bpf-next 0/4] Add bpf_loop_helper · Joanne Koong <hidden> · 2021-11-23
[PATCH v2 bpf-next 1/4] bpf: Add bpf_loop helper · Joanne Koong <hidden> · 2021-11-23
Re: [PATCH v2 bpf-next 1/4] bpf: Add bpf_loop helper · Andrii Nakryiko <hidden> · 2021-11-23
[PATCH v2 bpf-next 2/4] selftests/bpf: Add bpf_loop test · Joanne Koong <hidden> · 2021-11-23
[PATCH v2 bpf-next 3/4] selftests/bpf: measure bpf_loop verifier performance · Joanne Koong <hidden> · 2021-11-23
[PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Joanne Koong <hidden> · 2021-11-23
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Toke Høiland-Jørgensen <hidden> · 2021-11-23
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Joanne Koong <hidden> · 2021-11-24
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Toke Høiland-Jørgensen <hidden> · 2021-11-24
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Andrii Nakryiko <hidden> · 2021-11-24
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Toke Høiland-Jørgensen <hidden> · 2021-11-24
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Joanne Koong <hidden> · 2021-11-25
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Toke Høiland-Jørgensen <hidden> · 2021-11-25
Re: [PATCH v2 bpf-next 4/4] selftest/bpf/benchs: add bpf_loop benchmark · Joanne Koong <hidden> · 2021-11-29
Re: [PATCH v2 bpf-next 0/4] Add bpf_loop_helper · Joanne Koong <hidden> · 2021-11-23

From: Toke Høiland-Jørgensen <hidden>
Date: 2021-11-23 19:19:42

Joanne Koong [off-list ref] writes:

Add benchmark to measure the throughput and latency of the bpf_loop
call.

Testing this on qemu on my dev machine on 1 thread, the data is
as follows:

        nr_loops: 1
bpf_loop - throughput: 43.350 ± 0.864 M ops/s, latency: 23.068 ns/op

        nr_loops: 10
bpf_loop - throughput: 69.586 ± 1.722 M ops/s, latency: 14.371 ns/op

        nr_loops: 100
bpf_loop - throughput: 72.046 ± 1.352 M ops/s, latency: 13.880 ns/op

        nr_loops: 500
bpf_loop - throughput: 71.677 ± 1.316 M ops/s, latency: 13.951 ns/op

        nr_loops: 1000
bpf_loop - throughput: 69.435 ± 1.219 M ops/s, latency: 14.402 ns/op

        nr_loops: 5000
bpf_loop - throughput: 72.624 ± 1.162 M ops/s, latency: 13.770 ns/op

        nr_loops: 10000
bpf_loop - throughput: 75.417 ± 1.446 M ops/s, latency: 13.260 ns/op

        nr_loops: 50000
bpf_loop - throughput: 77.400 ± 2.214 M ops/s, latency: 12.920 ns/op

        nr_loops: 100000
bpf_loop - throughput: 78.636 ± 2.107 M ops/s, latency: 12.717 ns/op

        nr_loops: 500000
bpf_loop - throughput: 76.909 ± 2.035 M ops/s, latency: 13.002 ns/op

        nr_loops: 1000000
bpf_loop - throughput: 77.636 ± 1.748 M ops/s, latency: 12.881 ns/op

From this data, we can see that the latency per loop decreases as the
number of loops increases. On this particular machine, each loop had an
overhead of about ~13 ns, and we were able to run ~70 million loops
per second.

The latency figures are great, thanks! I assume these numbers are with
retpolines enabled? Otherwise 12ns seems a bit much... Or is this
because of qemu?

-Toke

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help