Re: [PATCH v10 0/11] seccomp: add thread sync ability
From: Kees Cook <hidden>
Date: 2014-07-16 21:23:35
Also in:
linux-api, linux-arch, linux-arm-kernel, lkml
On Wed, Jul 16, 2014 at 12:45 PM, Andy Lutomirski [off-list ref] wrote:
On Wed, Jul 16, 2014 at 10:54 AM, Kees Cook [off-list ref] wrote:quoted
On Mon, Jul 14, 2014 at 12:04 PM, Andy Lutomirski [off-list ref] wrote:quoted
On Fri, Jul 11, 2014 at 10:55 AM, Kees Cook [off-list ref] wrote:quoted
On Fri, Jul 11, 2014 at 9:49 AM, Oleg Nesterov [off-list ref] wrote:quoted
On 07/10, Kees Cook wrote:quoted
This adds the ability for threads to request seccomp filter synchronization across their thread group (at filter attach time). For example, for Chrome to make sure graphic driver threads are fully confined after seccomp filters have been attached. To support this, locking on seccomp changes via thread-group-shared sighand lock is introduced, along with refactoring of no_new_privs. Races with thread creation are handled via delayed duplication of the seccomp task struct field and cred_guard_mutex. This includes a new syscall (instead of adding a new prctl option), as suggested by Andy Lutomirski and Michael Kerrisk.I do not not see any problems in this version,Awesome! Thank you for all the reviews. :) If Andy and Michael are happy with this too, I think this is in good shape. \o/I think I'm happy with it. Is it in git somewhere for easy perusal? I have a cold, so my reviewing ability is a bit off, but I want to take a look at the final version, and git is a little easier than email for this.Hi Andy, Have you had a chance to look v10 over? I'd like to send a v11 with Oleg's Reviewed-by added (at James Morris's request). Should I add one from you as well?Trivial nits (take them or leave them): seccomp_check_mode confused me. Maybe seccomp_may_assign_mode would be a better name.
Good idea; I was unhappy with this name too. Done for v11.
In the writer locking changelog, should "(which is unique to the thread group)" be "(which is shared by all threads in the thread group)"?
Done.
This bit:
/*
* Explicitly enable no_new_privs here in case it got set
* between the task_struct being duplicated and holding the
* sighand lock. The seccomp state and nnp must be in sync.
*/
if (task_no_new_privs(current))
task_set_no_new_privs(p);
should arguably move the very end of task duplication -- someone will
want it for some other purpose some day.That certainly seems plausible, but for the moment, I want to keep nnp near seccomp, since that's where we have a race.
This bit:
/* If we have a seccomp mode, enable the thread flag. */
if (p->seccomp.mode != SECCOMP_MODE_DISABLED)
set_tsk_thread_flag(p, TIF_SECCOMP);
is not quite obvious to me -- it's not obvious why it's needed. If
it's to eliminate a race, can you explain the race in the comment? If
it's just paranoia, a WARN_ON or BUG_ON might be worthwhile.I've expanded the comment; it's the same issue as nnp: seccomp could have changed, so if we gained a mode, we need to set the flag too.
SECCOMP_FILTER_FLAG_MASK seems backwards to me. Maybe rename it to SECCOMP_FILTER_ALLOWED_FLAGS and invert it?
Excellent point. Other kernel users of the _MASK name aren't inverted. Fixed.
is_ancestor: calling the first parameter "candidate" is just a tiny bit odd -- it's the child that's up for consideration. (This is barely even worth the time it took me to type it.)
Switch argument and comment to "parent".
Less trivial nits: Can you change: fp = kzalloc(fp_size, GFP_KERNEL|__GFP_NOWARN); to fp = kcalloc(fprog->len, sizeof(struct sock_filter), GFP_KERNEL|__GFP_NOWARN); somewhere in this series (w/ a changelog comment that it's not exploitable to avoid scaring people).
Done, though I used a BUG_ON since the compiler will optimize it out (since it's an impossible state to be in).
In seccomp_prepare_user_filter, would it make sense to return -EINVAL if !user_filter? That will make it slightly more pleasant to implement TSYNC-without-change if anyone ever wants it. (This isn't really necessary -- it's just slightly more polite.)
I can't do this since EFAULT is already used to detect seccomp capabilities from userspace.
Once James picks this up, I'll rebase my series on top of it. I think they'll conflict slightly.
It's not bad. I tripped over sd -> sd_local and the related sd vs &sd in the SK_RUN_FILTER call, but otherwise it was pretty trivial.
Feel free to add my Reviewed-by to the whole series except the ARM and MIPS patches.
Thanks!
And some thoughts:
Your changelog comment for the seccomp syscall suggests that you're
thinking about extending the tracer interface. I'd suggest a rather
different design: add a concept of a seccomp monitor associated with a
particular filter and an action SECCOMP_RET_MONITOR.
SECCOMP_RET_MONITOR will tell the monitor what syscall was attempted
and will wait for further instructions. The monitor can then ask for
zero or more syscalls to be issued on behalf of the waiting process
and then resume it. Each of those additional syscalls will be further
filtered through all seccomp filters outside of the one that returned
SECCOMP_RET_MONITOR.
This would avoid all of the nastiness inherent in using ptrace and
would nest much more nicely. And it could be far faster.
If you decide to do something to ARM along the lines of what I'm doing
to x86, you may want to fix this:
/*
* Make sure that any changes to mode from another thread have
* been seen after TIF_SECCOMP was seen.
*/
rmb();
It should have essentially no effect on x86, though.Noted, thanks! -Kees -- Kees Cook Chrome OS Security