Re: [PATCH v8 6/8] ptrace,seccomp: Add PTRACE_SECCOMP support
From: Will Drewry <wad@chromium.org>
Date: 2012-02-21 17:31:08
Also in:
lkml, netdev
On Fri, Feb 17, 2012 at 4:55 PM, Indan Zupancic [off-list ref] wrote:
Hello, On Fri, February 17, 2012 17:23, Will Drewry wrote:quoted
On Thu, Feb 16, 2012 at 11:08 PM, Indan Zupancic [off-list ref] wrote:quoted
quoted
+/* Indicates if a tracer is attached. */ +#define SECCOMP_FLAGS_TRACED 0That's not the best way to check if a tracer is attached, and if you did use it for that, you don't need to toggle it all the time.It's logically no different than task->ptrace. If it is less desirable, that's fine, but it is functionally equivalent.Except that when using task->ptrace the ptrace code keeps track of it and clears it when the ptracer goes away. And you're toggling SECCOMP_FLAGS_TRACED all the time.
Yep, the code is gone in the coming version. It was ugly to need to change it everywhere TIF_SYSCALL_TRACE was toggled.
quoted
quoted
quoted
diff --git a/kernel/seccomp.c b/kernel/seccomp.c index c75485c..f9d419f 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c@@ -289,6 +289,8 @@ void copy_seccomp(struct seccomp *child,{ child->mode = prev->mode; child->filter = get_seccomp_filter(prev->filter); + /* Note, this leaves seccomp tracing enabled across fork. */ + child->flags = prev->flags;What if the child isn't ptraced?Then falling through with TIF_SYSCALL_TRACE will result in the SECCOMP_RET_TRACE events to be allowed, but this comes back to the race. If I can effectively "check" that ptrace did its job, then I think this becomes a non-issue.Yes. But it would be still sloppy state tracking, which can lead to all kind of unlikely but interesting scenario's. If the child is ever attached to later on, that flag will be still set. Same is true for any descendant, they all will have that flag copied.
Yup - it'd lead to tracehook fall through and an implicit allow. Not ideal at all.
quoted
quoted
quoted
} /**@@ -363,6 +365,19 @@ int __secure_computing_int(int this_syscall)syscall_rollback(current, task_pt_regs(current)); seccomp_send_sigtrap(); return -1; + case SECCOMP_RET_TRACE: + if (!seccomp_traced(¤t->seccomp)) + return -1; + /* + * Delegate to TIF_SYSCALL_TRACE. This allows fast-path + * seccomp calls to delegate to slow-path if needed. + * Since TIF_SYSCALL_TRACE will be unset on ptrace(2) + * continuation, there should be no direct side + * effects. If TIF_SYSCALL_TRACE is already set, this + * has no effect. + */ + set_tsk_thread_flag(current, TIF_SYSCALL_TRACE); + /* Falls through to allow. */This is nice and simple, but not race-free. You want to check if the ptracer handled the event or not. If the ptracer died before handling this then the syscall should be denied and the task should be killed.Hrm. I think there's a way to do this without forcing seccomp to always go slow path. I'll update the patch and see how it goes.You only have to go through the slow path for the SECCOMP_RET_TRACE case.
You'd have to know at syscall entry time to decide or pre-scan the bpf filter to see if SECCOMP_RET_TRACE is returned non-programmatically, then add a TIF flag for slow pathing, but that seems crazy bad.
But yeah, toggling TIF_SYSCALL_TRACE seems the only way to avoid the slow path, sometimes. The downside is that it's unexpected behaviour which may clash with arch entry code, so I'm not sure if that's a good idea. I think always going through the slow path isn't too bad, compared to the ptrace alternative it's still a lot faster.
Since supporting that behavior is documented as a prerequisite for adding HAVE_ARCH_SECCOMP_FILTER, I don't see how it could be unexpected behavior. On systems, like x86, where seccomp is always slowpath, it has no impact. However, it means that if a fast path is added (like audit), then it will need to know to re-check the threadinfo flags. I don't want to try to optimize in advance, but it'd be nice to not close off any options for later. If an explicit ptrace_event(SECCOMP) call was being made, then we'd be stuck in the slow path or stuck making the ptrace code have more ifs for determining if the source was a normal ptrace event or special seccomp-triggered one. That might be okay as a long-term-thing, though, since the other option (which the incoming patchset does) is to add a post-trace callback into seccomp. I'm not sure which is truly preferable.
quoted
quoted
Many people would like a PTRACE_O_KILL_TRACEE_IF_DEBUGGER_DIES option, Oleg was working on that, among other things. Perhaps re-use that to handle this case too?Well, if you can inject initial code into the tracee, then it can call prctl(PR_SET_PDEATHSIG, SIGKILL). Then when the tracer dies, the child dies.That only works for child tracees, not descendants of the tracee.
True enough.
quoted
If the SIGKILL race in arch_ptrace_... is resolved, then a SIGKILL that arrives between seccomp and delegation to ptrace should result in process death. Though perhaps my proposal above will make seccomp's integration with ptrace less subject to ptrace behaviors.Oleg fixed the SIGKILL problem (it wasn't a race), it should go upstream in the next kernel version, I think.
Pick your own name for it then, I guess. The signal lock was held in ptrace_notify. Then, in order to hand off to the arch_ptrace_notify code, it releases the lock, then claims it again after. If SIGKILL was delivered in that time window, then the post-arch-handoff code would see it, skip notification of the tracer, and allow the syscall to run prior to terminating the task. I'm excited to see it fixed :)
quoted
quoted
quoted
case SECCOMP_RET_ALLOW:For this and the ERRNO case you could check that PTRACE_O_SECCOMP option and decide to do something or not in ptrace.For ERRNO, I'd prefer not to since it adds implicit behavior to the rules and, without pulling a ptrace_event()ish call into this code, it would change the return flow and potentially open up errno, which should be solid, to races, etc. For ALLOW, sure, but at that point, just use PTRACE_SYSCALL. Perhaps this can all be ameliorated if I can get a useful ptrace_entry completed notification.You don't want ptrace to be able to override the decision? Fair enough. Or did you mean something else?
Exactly. The first time I went down this path, I let a tracer pick up any denied syscalls, but that complicated the interactions and security model quite a bit. I also don't want to add an implicit dependency on the syscall slow-path for any other return values -- just in case the proposed TIF_SYSCALL_TRACE approach isn't acceptable. thanks! will