[RFC 5/5] x86,seccomp: Add a seccomp fastpath
From: Alexei Starovoitov <hidden>
Date: 2014-06-11 21:29:23
Also in:
linux-arch, linux-mips, lkml
On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski [off-list ref] wrote:
On my VM, getpid takes about 70ns. Before this patch, adding a single-instruction always-accept seccomp filter added about 134ns of overhead to getpid. With this patch, the overhead is down to about 13ns.
interesting. Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5? 13ns is still with seccomp enabled, but without filters?
quoted hunk ↗ jump to hunk
I'm not really thrilled by this patch. It has two main issues: 1. Calling into code in kernel/seccomp.c from assembly feels ugly. 2. The x86 64-bit syscall entry now has four separate code paths: fast, seccomp only, audit only, and slow. This kind of sucks. Would it be worth trying to rewrite the whole thing in C with a two-phase slow path approach like I'm using here for seccomp? Signed-off-by: Andy Lutomirski <luto@amacapital.net> --- arch/x86/kernel/entry_64.S | 45 +++++++++++++++++++++++++++++++++++++++++++++ include/linux/seccomp.h | 4 ++-- 2 files changed, 47 insertions(+), 2 deletions(-)diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index f9e713a..feb32b2 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S@@ -683,6 +683,45 @@ sysret_signal: FIXUP_TOP_OF_STACK %r11, -ARGOFFSET jmp int_check_syscall_exit_work +#ifdef CONFIG_SECCOMP + /* + * Fast path for seccomp without any other slow path triggers. + */ +seccomp_fastpath: + /* Build seccomp_data */ + pushq %r9 /* args[5] */ + pushq %r8 /* args[4] */ + pushq %r10 /* args[3] */ + pushq %rdx /* args[2] */ + pushq %rsi /* args[1] */ + pushq %rdi /* args[0] */ + pushq RIP-ARGOFFSET+6*8(%rsp) /* rip */ + pushq %rax /* nr and junk */ + movl $AUDIT_ARCH_X86_64, 4(%rsp) /* arch */ + movq %rsp, %rdi + call seccomp_phase1
the assembler code is pretty much repeating what C does in populate_seccomp_data(). Assuming the whole gain came from patch 5 why asm version is so much faster than C? it skips SAVE/RESTORE_REST... what else? If the most of the gain is from all patches combined (mainly from 2 phase approach) then why bother with asm? Somehow it feels that the gain is due to better branch prediction in asm version. If you change few 'unlikely' in C to 'likely', it may get to the same performance... btw patches #1-3 look good to me. especially #1 is nice.
quoted hunk ↗ jump to hunk
+ addq $8*8, %rsp + cmpq $1, %rax + ja seccomp_invoke_phase2 + LOAD_ARGS 0 /* restore clobbered regs */ + jb system_call_fastpath + jmp ret_from_sys_call + +seccomp_invoke_phase2: + SAVE_REST + FIXUP_TOP_OF_STACK %rdi + movq %rax,%rdi + call seccomp_phase2 + + /* if seccomp says to skip, then set orig_ax to -1 and skip */ + test %eax,%eax + jz 1f + movq $-1, ORIG_RAX(%rsp) +1: + mov ORIG_RAX(%rsp), %rax /* reload rax */ + jmp system_call_post_trace /* and maybe do the syscall */ +#endif + #ifdef CONFIG_AUDITSYSCALL /* * Fast path for syscall audit without full syscall trace.@@ -717,6 +756,10 @@ sysret_audit: /* Do syscall tracing */ tracesys: +#ifdef CONFIG_SECCOMP + testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SECCOMP),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) + jz seccomp_fastpath +#endif #ifdef CONFIG_AUDITSYSCALL testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET) jz auditsys@@ -725,6 +768,8 @@ tracesys: FIXUP_TOP_OF_STACK %rdi movq %rsp,%rdi call syscall_trace_enter + +system_call_post_trace: /* * Reload arg registers from stack in case ptrace changed them. * We don't reload %rax because syscall_trace_enter() returneddiff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 4fc7a84..d3d4c52 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h@@ -37,8 +37,8 @@ static inline int secure_computing(void) #define SECCOMP_PHASE1_OK 0 #define SECCOMP_PHASE1_SKIP 1 -extern u32 seccomp_phase1(struct seccomp_data *sd); -int seccomp_phase2(u32 phase1_result); +asmlinkage __visible extern u32 seccomp_phase1(struct seccomp_data *sd); +asmlinkage __visible int seccomp_phase2(u32 phase1_result); #else extern void secure_computing_strict(int this_syscall); #endif --1.9.3 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo at vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/