Thread (28 messages) 28 messages, 4 authors, 2015-02-07

[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases

From: luto@amacapital.net (Andy Lutomirski)
Date: 2015-02-06 20:20:44
Also in: linux-arch, linux-mips, lkml

On Fri, Feb 6, 2015 at 12:16 PM, Kees Cook [off-list ref] wrote:
On Fri, Feb 6, 2015 at 12:12 PM, Andy Lutomirski [off-list ref] wrote:
quoted
On Fri, Feb 6, 2015 at 12:07 PM, Kees Cook [off-list ref] wrote:
quoted
On Fri, Feb 6, 2015 at 11:32 AM, Andy Lutomirski [off-list ref] wrote:
quoted
On Fri, Feb 6, 2015 at 11:23 AM, Kees Cook [off-list ref] wrote:
quoted
And especially since a ptracer
can change syscalls during syscall-enter-stop to any syscall it wants,
bypassing seccomp. This condition is already documented.
If a ptracer (using PTRACE_SYSCALL) were to get the entry callback
before seccomp, then this oddity would go away, which might be a good
thing.  A ptracer could change the syscall, but seccomp would based on
what the ptracer changed the syscall to.
I want kill events to trigger immediately. I don't want to leave the
ptrace surface available on a SECCOMP_RET_KILL. So maybe it can be
seccomp phase 1, then ptrace, then seccomp phase 2? And pass more
information between phases to determine how things should behave
beyond just "skip"?
I thought so too, originally, but I'm far less convinced now, for two reasons:

1. I think that a lot of filters these days use RET_ERRNO heavily, so
this won't benefit them.

2. I'm not convinced it really reduces the attack surface for anyone.
Unless your filter is literally "return SECCOMP_RET_KILL", then the
seccomp-filtered task can always cause the ptracer to get a pair of
syscall notifications.  Also, the task can send itself signals (using
page faults, breakpoints, etc) and cause ptrace events via other
paths.
What are you thinking for a solution?
I'm writing a patch now.  It's an ABI break, but this thread seems to
show that the ABI was somewhat useless before the split-phase changes,
and it's differently broken now, so I would be surprised if the change
broke anything that was currently working.  I'll send it later today,
hopefully.
quoted hunk ↗ jump to hunk
As for capping SECCOMP_RET_ERRNO to MAX_ERRNO, how about this (sorry
if gmail butchers the paste):
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 4ef9687ac115..c88148d20bd5 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -629,7 +629,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct

        switch (action) {
        case SECCOMP_RET_ERRNO:
-               /* Set the low-order 16-bits as a errno. */
+               /* Set the low-order bits as a errno. */
+               if (data > MAX_ERRNO)
+                       data = MAX_ERRNO;
                syscall_set_return_value(current, task_pt_regs(current),
                                         -data, 0);
                goto skip;
I'm fine with this, but I'm not entirely convinced it solves a
problem.  SECCOMP_RET_ERRNO | 5000 didn't work before, and it doesn't
work now.  Admittedly, the new failure mode is possibly better.

--Andy
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help