Thread (18 messages) 18 messages, 6 authors, 2021-11-09

Re: [RFC PATCH v3 0/3] Introduce BPF map tracing capability

From: Yonghong Song <hidden>
Date: 2021-11-04 04:24:03
Also in: bpf, lkml


On 11/3/21 10:49 AM, Alexei Starovoitov wrote:
On Wed, Nov 3, 2021 at 10:45 AM Joe Burton [off-list ref] wrote:
quoted
Sort of - I hit issues when defining the function in the same
compilation unit as the call site. For example:

   static noinline int bpf_array_map_trace_update(struct bpf_map *map,
                 void *key, void *value, u64 map_flags)
Not quite :)
You've had this issue because of 'static noinline'.
Just 'noinline' would not have such issues even in the same file.
This seems not true. With latest trunk clang,

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
        0: b8 01 00 00 00                movl    $1, %eax
        5: c3                            retq
        6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
       10: b8 02 00 00 00                movl    $2, %eax
       15: c3                            retq
[$ ~/tmp2]

The compiler did the optimization and the original noinline function 
still in the binary.

With a single foo() in bar() has the same effect.

asm("") indeed helped preserve the call.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { asm(""); return 1; }
int bar() { return foo() + foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
        0: b8 01 00 00 00                movl    $1, %eax
        5: c3                            retq
        6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
       10: 50                            pushq   %rax
       11: e8 00 00 00 00                callq   0x16 <bar+0x6>
       16: e8 00 00 00 00                callq   0x1b <bar+0xb>
       1b: b8 02 00 00 00                movl    $2, %eax
       20: 59                            popq    %rcx
       21: c3                            retq
[$ ~/tmp2]

Note with asm(""), foo() is called twice, but the compiler optimization
knows foo()'s return value is 1 so it did calculation at compiler time,
assign the 2 to %eax and returns.

Having a single foo() in bar() has the same effect.

[$ ~/tmp2] cat t.c
int __attribute__((noinline)) foo() { return 1; }
int bar() { return foo(); }
[$ ~/tmp2] clang -O2 -c t.c
[$ ~/tmp2] llvm-objdump -d t.o

t.o:    file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <foo>:
        0: b8 01 00 00 00                movl    $1, %eax
        5: c3                            retq
        6: 66 2e 0f 1f 84 00 00 00 00 00 nopw    %cs:(%rax,%rax)

0000000000000010 <bar>:
       10: b8 01 00 00 00                movl    $1, %eax
       15: c3                            retq
[$ ~/tmp2]

I checked with a few llvm compiler engineers in Facebook.
They mentioned there is nothing preventing compiler from doing
optimization like poking inside the noinline function and doing
some optimization based on that knowledge.
Reminder: please don't top post and trim your replies.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help