Thread (14 messages) 14 messages, 5 authors, 2014-06-13

[RFC 5/5] x86,seccomp: Add a seccomp fastpath

From: Alexei Starovoitov <hidden>
Date: 2014-06-11 21:29:23
Also in: linux-arch, linux-mips, lkml

On Wed, Jun 11, 2014 at 1:23 PM, Andy Lutomirski [off-list ref] wrote:
On my VM, getpid takes about 70ns.  Before this patch, adding a
single-instruction always-accept seccomp filter added about 134ns of
overhead to getpid.  With this patch, the overhead is down to about
13ns.
interesting.
Is this the gain from patch 4 into patch 5 or from patch 0 to patch 5?
13ns is still with seccomp enabled, but without filters?
quoted hunk ↗ jump to hunk
I'm not really thrilled by this patch.  It has two main issues:

1. Calling into code in kernel/seccomp.c from assembly feels ugly.

2. The x86 64-bit syscall entry now has four separate code paths:
fast, seccomp only, audit only, and slow.  This kind of sucks.
Would it be worth trying to rewrite the whole thing in C with a
two-phase slow path approach like I'm using here for seccomp?

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
---
 arch/x86/kernel/entry_64.S | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/seccomp.h    |  4 ++--
 2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index f9e713a..feb32b2 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -683,6 +683,45 @@ sysret_signal:
        FIXUP_TOP_OF_STACK %r11, -ARGOFFSET
        jmp int_check_syscall_exit_work

+#ifdef CONFIG_SECCOMP
+       /*
+        * Fast path for seccomp without any other slow path triggers.
+        */
+seccomp_fastpath:
+       /* Build seccomp_data */
+       pushq %r9                               /* args[5] */
+       pushq %r8                               /* args[4] */
+       pushq %r10                              /* args[3] */
+       pushq %rdx                              /* args[2] */
+       pushq %rsi                              /* args[1] */
+       pushq %rdi                              /* args[0] */
+       pushq RIP-ARGOFFSET+6*8(%rsp)           /* rip */
+       pushq %rax                              /* nr and junk */
+       movl $AUDIT_ARCH_X86_64, 4(%rsp)        /* arch */
+       movq %rsp, %rdi
+       call seccomp_phase1
the assembler code is pretty much repeating what C does in
populate_seccomp_data(). Assuming the whole gain came from
patch 5 why asm version is so much faster than C?
it skips SAVE/RESTORE_REST... what else?
If the most of the gain is from all patches combined (mainly from
2 phase approach) then why bother with asm?

Somehow it feels that the gain is due to better branch prediction
in asm version. If you change few 'unlikely' in C to 'likely', it may
get to the same performance...

btw patches #1-3 look good to me. especially #1 is nice.
quoted hunk ↗ jump to hunk
+       addq $8*8, %rsp
+       cmpq $1, %rax
+       ja seccomp_invoke_phase2
+       LOAD_ARGS 0  /* restore clobbered regs */
+       jb system_call_fastpath
+       jmp ret_from_sys_call
+
+seccomp_invoke_phase2:
+       SAVE_REST
+       FIXUP_TOP_OF_STACK %rdi
+       movq %rax,%rdi
+       call seccomp_phase2
+
+       /* if seccomp says to skip, then set orig_ax to -1 and skip */
+       test %eax,%eax
+       jz 1f
+       movq $-1, ORIG_RAX(%rsp)
+1:
+       mov ORIG_RAX(%rsp), %rax                /* reload rax */
+       jmp system_call_post_trace              /* and maybe do the syscall */
+#endif
+
 #ifdef CONFIG_AUDITSYSCALL
        /*
         * Fast path for syscall audit without full syscall trace.
@@ -717,6 +756,10 @@ sysret_audit:

        /* Do syscall tracing */
 tracesys:
+#ifdef CONFIG_SECCOMP
+       testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SECCOMP),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+       jz seccomp_fastpath
+#endif
 #ifdef CONFIG_AUDITSYSCALL
        testl $(_TIF_WORK_SYSCALL_ENTRY & ~_TIF_SYSCALL_AUDIT),TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
        jz auditsys
@@ -725,6 +768,8 @@ tracesys:
        FIXUP_TOP_OF_STACK %rdi
        movq %rsp,%rdi
        call syscall_trace_enter
+
+system_call_post_trace:
        /*
         * Reload arg registers from stack in case ptrace changed them.
         * We don't reload %rax because syscall_trace_enter() returned
diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
index 4fc7a84..d3d4c52 100644
--- a/include/linux/seccomp.h
+++ b/include/linux/seccomp.h
@@ -37,8 +37,8 @@ static inline int secure_computing(void)
 #define SECCOMP_PHASE1_OK      0
 #define SECCOMP_PHASE1_SKIP    1

-extern u32 seccomp_phase1(struct seccomp_data *sd);
-int seccomp_phase2(u32 phase1_result);
+asmlinkage __visible extern u32 seccomp_phase1(struct seccomp_data *sd);
+asmlinkage __visible int seccomp_phase2(u32 phase1_result);
 #else
 extern void secure_computing_strict(int this_syscall);
 #endif
--
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help