Re: [PATCH bpf-next v2 02/18] x86,bpf: add bpf_global_caller for global trampoline
From: Alexei Starovoitov <hidden>
Date: 2025-07-15 16:35:22
Also in:
bpf, lkml
On Tue, Jul 15, 2025 at 1:37 AM Menglong Dong [off-list ref] wrote:
On 7/15/25 10:25, Alexei Starovoitov wrote:quoted
On Thu, Jul 3, 2025 at 5:17 AM Menglong Dong [off-list ref] wrote:quoted
+static __always_inline void +do_origin_call(unsigned long *args, unsigned long *ip, int nr_args) +{ + /* Following code will be optimized by the compiler, as nr_args + * is a const, and there will be no condition here. + */ + if (nr_args == 0) { + asm volatile( + RESTORE_ORIGIN_0 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : + ); + } else if (nr_args == 1) { + asm volatile( + RESTORE_ORIGIN_1 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi" + ); + } else if (nr_args == 2) { + asm volatile( + RESTORE_ORIGIN_2 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi", "rsi" + ); + } else if (nr_args == 3) { + asm volatile( + RESTORE_ORIGIN_3 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi", "rsi", "rdx" + ); + } else if (nr_args == 4) { + asm volatile( + RESTORE_ORIGIN_4 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi", "rsi", "rdx", "rcx" + ); + } else if (nr_args == 5) { + asm volatile( + RESTORE_ORIGIN_5 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi", "rsi", "rdx", "rcx", "r8" + ); + } else if (nr_args == 6) { + asm volatile( + RESTORE_ORIGIN_6 CALL_NOSPEC "\n" + "movq %%rax, %0\n" + : "=m"(args[nr_args]), ASM_CALL_CONSTRAINT + : [args]"r"(args), [thunk_target]"r"(*ip) + : "rdi", "rsi", "rdx", "rcx", "r8", "r9" + ); + } +}What is the performance difference between 0-6 variants? I would think save/restore of regs shouldn't be that expensive. bpf trampoline saves only what's necessary because it can do this micro optimization, but for this one, I think, doing _one_ global trampoline that covers all cases will simplify the code a lot, but please benchmark the difference to understand the trade-off.According to my benchmark, it has ~5% overhead to save/restore *5* variants when compared with *0* variant. The save/restore of regs is fast, but it still need 12 insn, which can produce ~6% overhead.
I think it's an ok trade off, because with one global trampoline we do not need to call rhashtable lookup before entering bpf prog. bpf prog will do it on demand if/when it needs to access arguments. This will compensate for a bit of lost performance due to extra save/restore. PS pls don't add your chinatelecom.cn email in cc. gmail just cannot deliver there and it's annoying to keep deleting it manually in every reply.