Thread (11 messages) 11 messages, 3 authors, 2021-11-08

Re: [RFC PATCH bpf-next 2/2] selftests/bpf: add benchmark bpf_strcmp

From: Andrii Nakryiko <hidden>
Date: 2021-11-08 18:00:44
Also in: bpf

On Mon, Nov 8, 2021 at 6:05 AM Hou Tao [off-list ref] wrote:
HI,

On 11/7/2021 2:43 AM, Alexei Starovoitov wrote:
quoted
On Sat, Nov 06, 2021 at 09:28:22PM +0800, Hou Tao wrote:
quoted
The benchmark runs a loop 5000 times. In the loop it reads the file name
from kprobe argument into stack by using bpf_probe_read_kernel_str(),
and compares the file name with a target character or string.

Three cases are compared: only compare one character, compare the whole
string by a home-made strncmp() and compare the whole string by
bpf_strcmp().

The following is the result:

x86-64 host:

one character: 2613499 ns
whole str by strncmp: 2920348 ns
whole str by helper: 2779332 ns

arm64 host:

one character: 3898867 ns
whole str by strncmp: 4396787 ns
whole str by helper: 3968113 ns

Compared with home-made strncmp, the performance of bpf_strncmp helper
improves 80% under x86-64 and 600% under arm64. The big performance win
on arm64 may comes from its arch-optimized strncmp().
80% and 600% improvement?!
I don't understand how this math works.
Why one char is barely different in total nsec than the whole string?
The string shouldn't miscompare on the first char as far as I understand the test.
Because the result of "one character" includes the overhead of process filtering and
string read.
My bad, I should explain the tests results in more details.
Maybe use bench framework for your benchmark? It allows to setup the
benchmark and collect measurements in a more structured way. Check
some existing benchmarks under benchs/ in selftests/bpf directory.

To actually test just bpf_strncmp() don't add
bpf_probe_read_kernel_str() into the loop logic, set your data in
global variable and just search it. This will give you more accurate
microbenchmark data.
Three tests are exercised:

(1) one character
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Only compare the first character of file name

(2) whole str by strncmp
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using home-made strncmp(): the compared two strings are the same, so
the whole string is compared

(3) whole str by helper
Filter unexpected caller by bpf_get_current_pid_tgid()
Use bpf_probe_read_kernel_str() to read the file name into 64-bytes sized-buffer
in stack
Compare by using bpf_strncmp: the compared two strings are the same, so
the whole string is compared

Now "(1) one character" is used to calculate the overhead of process filtering and
string read. So under x86-64, the overhead of strncmp() is

  total time of whole str by strncmp  test  - total time of no character test =
306849 ns.

The overhead of bpf_strncmp() is:
  total time of whole str by helper test - total time of no character test =
165833 ns

So the performance win is about (306849  / 165833 ) * 100 - 100 = ~85%

And the win under arm64 is about (497920 / 69246) * 100 - 100 = ~600%
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help