Re: [PATCH v3 2/4] seccomp: Add SECCOMP_FILTER_FLAG_KILL_PROCESS

From: Kees Cook <hidden>
Date: 2017-08-11 18:32:57
Also in: linux-security-module, lkml

On Fri, Aug 11, 2017 at 9:58 AM, Tyler Hicks [off-list ref] wrote:

quoted

@@ -201,8 +203,25 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd,
       */
      for (; f; f = f->prev) {
              u32 cur_ret = BPF_PROG_RUN(f->prog, sd);
+             u32 action = cur_ret & SECCOMP_RET_ACTION;

-             if ((cur_ret & SECCOMP_RET_ACTION) < (ret & SECCOMP_RET_ACTION)) {
+             /*
+              * In order to distinguish between SECCOMP_RET_KILL and
+              * "higher priority" synthetic SECCOMP_RET_KILL_PROCESS
+              * identified by the kill_process filter flag, treat any
+              * case as immediately stopping filter processing. No
+              * higher priority action can exist, and we can't stop
+              * on the first RET_KILL (which may not have set
+              * f->kill_process) when a RET_KILL further up the filter
+              * list may have f->kill_process set which would go
+              * unnoticed.
+              */
+             if (unlikely(action == SECCOMP_RET_KILL && f->kill_process)) {
+                     *match = f;
+                     return cur_ret;
+             }

Why not let the application enforce this via the seccomp filter? In
other words, the first filter loaded with
SECCOMP_FILTER_FLAG_KILL_PROCESS set could have a rule in the filter
that only allows seccomp(2) to be called in the future with the
SECCOMP_FILTER_FLAG_KILL_PROCESS flag set.

I've been using the guide of "if SECCOMP_RET_KILL_PROCESS _did_ exist,
how would its semantics differ?"

In that magic world, it wouldn't be possible to create a seccomp
filter to screen out SECCOMP_RET_KILL_PROCESS. Also, being able to
distinguish between the two states (see below).

I understand the reasoning for wanting to enforce this automatically at
the kernel level but I think mixing return action priorities with filter
flags could be confusing and inflexible in the long run since filters
are inherited and your parent's desire to kill the entire thread group
may not mix with your desire to only kill a single thread.

Blocking the use of SECCOMP_FILTER_FLAG_KILL_PROCESS just means a
child can never perform a KILL_PROCESS, which doesn't really make much
sense, IMO.

The trouble may be that KILL_PROCESS would be used sparingly by either
parent or child, in the sense that maybe "unknown syscall gets
KILL_PROCESS, but 'connect' should just do KILL_THREAD". Or the
reverse. There isn't a way to mix combinations of return values across
filter chains without treating it exactly like a "real"
SECCOMP_RET_KILL_PROCESS would have worked. That means I have to treat
it as "higher priority" in the seccomp_run_filters() loop (which is
luckily very very cheap, as the "unlikely(register == zero)" test is
correct branch-predicted for the non-zero case, and the test is cheap
(we've already done the assignment which we need for the "<" test
below it, so it's a single pipelined instruction for the zero flag).

I don't expect to adjust KILL_THREAD vs KILL_PROCESS ever again, so
I'm not too worried about inflexibility.

What I don't get in this version is a _single_ filter being able to
distinguish between KILL_THREAD and KILL_PROCESS. Userspace is forced
to split up a rule if it wants to have different results. Also, parent
_can_ stop a child from escalating their KILL_THREADs to KILL_PROCESS
via the filter you mentioned, which is weird.

I spent some time trying to use the high bit in the return, to make
this signed, and in the end it was much much more ugly, and I didn't
want to deal with the fallout to userspace which may suddenly have to
deal with unexpected bits in the BPF return:

basically s/u32/s32/ in __seccomp_filter() and seccomp_run_filters().
add #define SECCOMP_RET_ACTION_FULL 0xffff0000
add #define SECCOMP_RET_KILL_PROC      0x80000000

Then use SECCOMP_RET_ACTION_FULL to mask everything (after forcing a u32 cast).

But the more I stare at this, the more I just want a value that that
works correctly without totally crazy flags and things.

Another way that this doesn't mix perfectly with the existing design is
when the action is unknown. In that situation, we treat it as RET_KILL.
However, this patch hard-codes the comparison with RET_KILL so we get
into this situation where an unknown action is treated as RET_KILL
except when the filter has the FILTER_FLAG_KILL_PROCESS flag set and
then this short-circuit doesn't kick in. It is a corner case, for sure,
but worth mentioning.

Hm, yeah, good point. This leaves unknown returns as KILL_THREAD, not
KILL_PROCESS.

Let me spent some more time looking at the high bit version of this...

-Kees

-- 
Kees Cook
Pixel Security

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help