Re: [PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace
From: Christian Brauner <christian@brauner.io>
Date: 2018-10-11 01:51:52
Also in:
linux-api, linux-fsdevel, lkml
On Wed, Oct 10, 2018 at 10:26:22AM -0700, Tycho Andersen wrote:
On Wed, Oct 10, 2018 at 07:15:02PM +0200, Christian Brauner wrote:quoted
On Wed, Oct 10, 2018 at 09:54:58AM -0700, Tycho Andersen wrote:quoted
On Wed, Oct 10, 2018 at 05:39:57PM +0200, Christian Brauner wrote:quoted
On Wed, Oct 10, 2018 at 05:33:43PM +0200, Jann Horn wrote:quoted
On Wed, Oct 10, 2018 at 5:32 PM Paul Moore [off-list ref] wrote:quoted
On Tue, Oct 9, 2018 at 9:36 AM Jann Horn [off-list ref] wrote:quoted
+cc selinux people explicitly, since they probably have opinions on thisI just spent about twenty minutes working my way through this thread, and digging through the containers archive trying to get a good understanding of what you guys are trying to do, and I'm not quite sure I understand it all. However, from what I have seen, this approach looks very ptrace-y to me (I imagine to others as well based on the comments) and because of this I think ensuring the usual ptrace access controls are evaluated, including the ptrace LSM hooks, is the right thing to do.Basically the problem is that this new ptrace() API does something that doesn't just influence the target task, but also every other task that has the same seccomp filter. So the classic ptrace check doesn't work here.Just to throw this into the mix: then maybe ptrace() isn't the right interface and we should just go with the native seccomp() approach for now.Please no :). I don't buy your arguments that 3-syscalls vs. one is better. If I'm doing this setup with a new container, I have to do clone(CLONE_FILES), do this seccomp thing, so that my parent can pick it up again, then do another clone without CLONE_FILES, because in the general case I don't want to share my fd table with the container, wait on the middle task for errors, etc. So we're still doing a bunch of setup, and it feels more awkward than ptrace, with at least as many syscalls, and it only works for your children.You're talking about the case where you already have shot yourself in the foot by blocking basically all other sensible ways of getting the fd out.Ok, but these other ways involve syscalls too (sendmsg() or whatever). And if you're going to allow arbitrary policy from your users, you have to be maximally flexible.
So, I totally like the idea of being able to get an fd before the filter is active. If this could be done in seccomp()-only it would be A+ (See Andy's mail in the other thread.) But I really don't want to keep you working on this forever. :)
quoted
Also, this was meant to show that parts of your initial justification for implementing the ptrace() way of getting an fd doesn't really stand. And it doesn't really. Even with ptrace() you can get into situations where you're not able to get an fd. (see prior threads)Of course. I guess my point was that we shouldn't design an API that's impossible to use. I'll drop the notes about sendmsg() from the commit message. Tycho