Re: [PATCHv3 0/2] capability controlled user-namespaces

From: Mahesh Bandewar (महेश बंडेवार) <hidden>
Date: 2018-01-10 02:09:22
Also in: linux-api, lkml

On Tue, Jan 9, 2018 at 2:28 PM, Serge E. Hallyn [off-list ref] wrote:

Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org):

quoted

On Mon, Jan 8, 2018 at 10:36 AM, Serge E. Hallyn [off-list ref] wrote:

quoted

Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com):

quoted

On Mon, Jan 8, 2018 at 10:11 AM, Serge E. Hallyn [off-list ref] wrote:

quoted

Quoting Mahesh Bandewar (महेश बंडेवार) (maheshb@google.com):

quoted

On Mon, Jan 8, 2018 at 7:47 AM, Serge E. Hallyn [off-list ref] wrote:

quoted

Quoting James Morris (james.l.morris-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org):

quoted

On Mon, 8 Jan 2018, Serge E. Hallyn wrote:
I meant in terms of "marking" a user ns as "controlled" type -- it's
unnecessary jargon from an end user point of view.

Ah, yes, that was my point in

http://lkml.iu.edu/hypermail/linux/kernel/1711.1/01845.html
and
http://lkml.iu.edu/hypermail/linux/kernel/1711.1/02276.html

quoted

This may happen internally but don't make it a special case with a
different name and don't bother users with internal concepts: simply
implement capability whitelists with the default having equivalent

So the challenge is to have unprivileged users be contained, while
allowing trusted workloads in containers created by a root user to
bypass the restriction.

Now, the current proposal actually doesn't support a root user starting
an application that it doesn't quite trust in such a way that it *is*
subject to the whitelist.

Well, this is not hard since root process can spawn another process
and loose privileges before creating user-ns to be controlled by the
whitelist.

It would have to drop cap_sys_admin for the container to be marked as
"controlled", which may prevent the container runtime from properly starting
the container.

Yes, but that's a conflict of trusted operations (that requires
SYS_ADMIN) and untrusted processes it may spawn.

Not sure I understand what you're saying, but

I guess that in any case the task which is doing unshare(CLONE_NEWNS)
can drop cap_sys_admin first.  Though that is harder if using clone,
and it is awkward because it's not the container manager, but the user,
who will judge whether the container workload should be restricted.
So the container driver will add a flag like "run-controlled", and
the driver will convert that to dropping a capability; which again
is weird.  It would seem nicer to introduce a userns flag, 'caps-controlled'
For an unprivileged userns, it is always set to 1, and root cannot
change it.  For a root-created userns, it stays 0, but root can set it
to 1 (using /proc file?).  In this way a either container runtime or just an
admin script can say "no wait I want this container to still be controlled".

Or we could instead add a second sysctl to decide whether all or only
'controlled' user namespaces should be controlled.  That's not pretty though.

Yes, I like your idea of a flag to clone() which will force the
user-ns to be controlled. This will have effect only on the root user
and any other user specifying is actually a NOP since those will be
controlled with or without that flag. But this is still an enhancement
to the current patch-set and I don't mind doing it as a follow-up
after this patch-series.

At this moment James has asked for Eric's input, which I believe
hasn't been recorded.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help