[PATCH v5 3/5] x86: Split syscall_trace_enter into two phases
From: luto@amacapital.net (Andy Lutomirski)
Date: 2015-02-06 20:20:44
Also in:
linux-arch, linux-mips, lkml
On Fri, Feb 6, 2015 at 12:16 PM, Kees Cook [off-list ref] wrote:
On Fri, Feb 6, 2015 at 12:12 PM, Andy Lutomirski [off-list ref] wrote:quoted
On Fri, Feb 6, 2015 at 12:07 PM, Kees Cook [off-list ref] wrote:quoted
On Fri, Feb 6, 2015 at 11:32 AM, Andy Lutomirski [off-list ref] wrote:quoted
On Fri, Feb 6, 2015 at 11:23 AM, Kees Cook [off-list ref] wrote:quoted
And especially since a ptracer can change syscalls during syscall-enter-stop to any syscall it wants, bypassing seccomp. This condition is already documented.If a ptracer (using PTRACE_SYSCALL) were to get the entry callback before seccomp, then this oddity would go away, which might be a good thing. A ptracer could change the syscall, but seccomp would based on what the ptracer changed the syscall to.I want kill events to trigger immediately. I don't want to leave the ptrace surface available on a SECCOMP_RET_KILL. So maybe it can be seccomp phase 1, then ptrace, then seccomp phase 2? And pass more information between phases to determine how things should behave beyond just "skip"?I thought so too, originally, but I'm far less convinced now, for two reasons: 1. I think that a lot of filters these days use RET_ERRNO heavily, so this won't benefit them. 2. I'm not convinced it really reduces the attack surface for anyone. Unless your filter is literally "return SECCOMP_RET_KILL", then the seccomp-filtered task can always cause the ptracer to get a pair of syscall notifications. Also, the task can send itself signals (using page faults, breakpoints, etc) and cause ptrace events via other paths.What are you thinking for a solution?
I'm writing a patch now. It's an ABI break, but this thread seems to show that the ABI was somewhat useless before the split-phase changes, and it's differently broken now, so I would be surprised if the change broke anything that was currently working. I'll send it later today, hopefully.
quoted hunk ↗ jump to hunk
As for capping SECCOMP_RET_ERRNO to MAX_ERRNO, how about this (sorry if gmail butchers the paste):diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 4ef9687ac115..c88148d20bd5 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c@@ -629,7 +629,9 @@ static u32 __seccomp_phase1_filter(int this_syscall, struct switch (action) { case SECCOMP_RET_ERRNO: - /* Set the low-order 16-bits as a errno. */ + /* Set the low-order bits as a errno. */ + if (data > MAX_ERRNO) + data = MAX_ERRNO; syscall_set_return_value(current, task_pt_regs(current), -data, 0); goto skip;
I'm fine with this, but I'm not entirely convinced it solves a problem. SECCOMP_RET_ERRNO | 5000 didn't work before, and it doesn't work now. Admittedly, the new failure mode is possibly better. --Andy