[PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch

From: Christian Brauner <hidden>
Date: 2018-01-03 11:23:27
Also in: lkml

Possibly related (same subject, not in this thread)

2017-12-22 · Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch · Eric W. Biederman <hidden>
2017-12-22 · Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch · Maciej Żenczykowski <hidden>
2017-12-22 · Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch · Eric W. Biederman <hidden>
2017-12-22 · Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch · Maciej Żenczykowski <hidden>
2017-12-21 · Re: [PATCH] userns: honour no_new_privs for cap_bset during user ns creation/switch · Eric W. Biederman <hidden>

On Fri, Dec 22, 2017 at 08:08:04AM -0600, Eric W. Biederman wrote:

Maciej ?enczykowski [off-list ref] writes:

quoted

Good point about CAP_DAC_OVERRIDE on files you own.

I think there is an argument that you are playing dangerous games with
the permission system there, as it isn't effectively a file you own if
you can't read it, and you can't change it's permissions.

Append-only files are useful - particularly for logging.
It could also simply be a non-readable file on a R/O filesystem.

quoted

Given little things like that I can completely see no_new_privs meaning
you can't create a user namespace.  That seems consistent with the
meaning and philosophy of no_new_privs.  So simple it is hard to get
wrong.

Yes, I could totally buy the argument that no_new_privs should prevent
creating a user ns.

However, there's also setns() and that's a fair bit harder to reason about.
Entirely deny it?  But that actually seems potentially useful...
Allow it but cap it?  That's what this does...

quoted

We could do more clever things like plug this whole in user namespaces,
and that would not hurt my feelings.

Sure, this particular one wouldn't be all that easy I think... and how
many such holes are there?
I found this particular one *after* your first reply in this thread.

quoted

However unless that is our only
choice to avoid badly breaking userspace I would have to have to depend
on user namespaces being perfect for no_new_privs to be a proper jail.

This stuff is ridiculously complex to get right from userspace. :-(

quoted

As a general rule user namespaces are where we tackle the subtle scary
things that should work, and no_new_privs is where we implement a simple
hard to get wrong jail.  Most of the time the effect is the same to an
outside observer (bounded permissions), but there is a real difference
in difficulty of implementation.

So, where to now...

Would you accept patches that:

- make no_new_priv block user ns creation?

- make no_new_priv block user ns transition?

Yes.

The approach will need to be rethought if there is anything deliberately
combining user namespaces and no_new_privs.  As regressions are a no-no.
So we need wide spread testing, to avoid that.

Just to be clear:  as soon as you set no_new_privs you should not be
able to create new user namespace anymore and you shouldn't be able to
attach to user namespaces anymore.
This should work and not cause regressions for e.g. liblxc were we
always set no_new_privs as one of the last steps and especially after we
have clone(CLONE_NEWUSER) the container and - for the attach case -
after we have already done the setns(fd, CLONE_NEWUSER). These use-cases
should be preserved. But it seems to me that they will be. But I'd like
to test this once patches are floating around.

But as much as possible I want no_new_privs to be simple and doing it's
job.

I will also take and encourage patches that close this minor privilege
escalation from the user namespace side.  As ideally creating a user
namespace should be as safe as no_new_privs.

quoted

Or perhaps we can assume that lack of create privs is sufficient, and
if there's a pre-existing user ns for you to enter, then that's
acceptable...
Although this implies you probably always want to combine no_new_privs
with a leaf user ns, or no_new_privs isn't all that useful for root in
root ns...
This added complexity, probably means it should be blocked...

Yes.

quoted

- inherits bset across user ns creation/transition based on X?
[this is the one we care about, because there are simply too many bugs
in the kernel wrt. certain caps]

That was my suspicion, and attack surface reduction is a different
discussion.  Would no_new_privs preventing a userns transition be enough
for the cases you care about?

Otherwise this is a different conversation because it is not about
semantics but about making the code safer to use.  In general if code is
simply not safe to user in a user namespace I would prefer to tighten
the permission checks, and just not allow that code.

Mostly what I have seen in previous conversations is simply concerns
about code that is not used or needed, being a problem.

quoted

X could be:
- a new flag similar to no_new_priv
- a new securebit flag (w/lockbit)  [provided securebits survive a
userns transition, haven't checked]
- or perhaps a new capability
- something else?

How do we make forward progress?

We start by causing no_new_privs to block userns creation and entering.

That sounds reasonable. One thing I thought about is what the
interaction between no_new_privs and /proc/sys/user/max_user_namespaces
should look like after this change. Both will basically have the same
effect, right? So wouldn't it make sense to have no_new_privs imply that
/proc/sys/user/max_user_namespaces is 0 and becomes read-only in the
in the calling process' user namespace (e.g. returning EINVAL would make
sense)? I worry that otherwise we might confuse users when they see that
/proc/sys/user/max_user_namespaces is not 0 but they still aren't able
to create user namespaces. I know that having multiple security options
block the same thing independent of each other is not unprecedented
(e.g. in addition to /proc/sys/user/max_user_namespaces CentOS and RHEL
have an additional boot parameter to {dis,en}able user namespace
creation which often confuses the heck out of users) but I'd rather not
make this a common scenario when we can establish a sensible interaction
between two security settings.

Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-security-module" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help