Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters

[PATCH v6 0/6] namei: resolveat(2) path resolution restriction API · Aleksa Sarai <hidden> · 2019-05-06
[PATCH v6 1/6] namei: split out nd->dfd handling to dirfd_path_init · Aleksa Sarai <hidden> · 2019-05-06
[PATCH v6 3/6] namei: LOOKUP_IN_ROOT: chroot-like path resolution · Aleksa Sarai <hidden> · 2019-05-06
[PATCH v6 2/6] namei: O_BENEATH-style path resolution flags · Aleksa Sarai <hidden> · 2019-05-06
[PATCH v6 4/6] namei: aggressively check for nd->root escape on ".." resolution · Aleksa Sarai <hidden> · 2019-05-06
[PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-06
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Jann Horn <jannh@google.com> · 2019-05-06
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-06
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Andy Lutomirski <luto@amacapital.net> · 2019-05-06
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-08
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Jann Horn <jannh@google.com> · 2019-05-10
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Andy Lutomirski <luto@kernel.org> · 2019-05-10
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Jann Horn <jannh@google.com> · 2019-05-10
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Christian Brauner <christian@brauner.io> · 2019-05-10
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Andy Lutomirski <luto@amacapital.net> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Linus Torvalds <torvalds@linux-foundation.org> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Linus Torvalds <torvalds@linux-foundation.org> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Linus Torvalds <torvalds@linux-foundation.org> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Christian Brauner <christian@brauner.io> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Aleksa Sarai <hidden> · 2019-05-11
Re: [PATCH v6 5/6] binfmt_*: scope path resolution of interpreters · Andy Lutomirski <luto@kernel.org> · 2019-05-11
[PATCH v6 6/6] namei: resolveat(2) syscall · Aleksa Sarai <hidden> · 2019-05-06

From: Andy Lutomirski <luto@kernel.org>
Date: 2019-05-11 22:40:02
Also in: linux-arch, linux-fsdevel, lkml

On May 11, 2019, at 10:21 AM, Linus Torvalds [off-list ref] wrote:

quoted

On Sat, May 11, 2019 at 1:00 PM Andy Lutomirski [off-list ref] wrote:

A better “spawn” API should fix this.

Andy, stop with the "spawn would be better".

It doesn’t have to be spawn per se.  But the current situation sucks.

Notice? None of the real problems are about execve or would be solved
by any spawn API. You just think that because you've apparently been
talking to too many MS people that think fork (and thus indirectly
execve()) is bad process management.

I’ve literally never spoken to an MS person about it.

What container managers and init systems *want* is a way to drop
privileges, change namespaces, etc and then run something in a
controlled way so that the intermediate states aren’t dangerous. An
API for this could be spawn-like or exec-like — that particular
distinction is beside the point.  Having personally written code that
mucks with namepsaces, I've wanted two particular abilities that are
both quite awkward:

a) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary in the container's
filesystem, all atomically enough that I don't need to worry about
accidentally leaking privileges into the container.  A
super-duper-non-dumpable mode would kind of allow this, but I'd worry
that there's some other hole besides ptrace() and /proc/self.

b) Change all my UIDs and GIDs to match a container, enter that
container's namespaces, and run some binary that is *not* in the
container's filesystem.  This happens, for example, if the container's
mount namespace has no exec mounts at all.  We don't have a fantastic
way to do this at all right now due to /proc/self/exe.

Regardless, the actual CVE at hand would have been nicely avoided if
writing to /proc/self/exe didn’t work, and I see no reason we can’t
make that happen.

I suppose we could also consider a change to disable /proc/self/exe if
it's not reachable from /proc/self/root.  By "disable", I mean that
readlink() should maybe still work, but actually trying to open it
could probably fail safely.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help