Re: [PATCH 0/3] Introduce user namespace capabilities

From: John Johansen <john.johansen@canonical.com>
Date: 2024-05-21 14:29:56
Also in: keyrings, linux-fsdevel, lkml

On 5/18/24 05:20, Serge Hallyn wrote:

On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote:

quoted

On 5/17/2024 4:42 AM, Jonathan Calmels wrote:

quoted

On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:

quoted

I suggest that adding a capability set for user namespaces is a bad idea:
	- It is in no way obvious what problem it solves
	- It is not obvious how it solves any problem
	- The capability mechanism has not been popular, and relying on a
	  community (e.g. container developers) to embrace it based on this
	  enhancement is a recipe for failure
	- Capabilities are already more complicated than modern developers
	  want to deal with. Adding another, special purpose set, is going
	  to make them even more difficult to use.

Sorry if the commit wasn't clear enough.

While, as others have pointed out, the commit description left
much to be desired, that isn't the biggest problem with the change
you're proposing.

quoted

  Basically:

- Today user namespaces grant full capabilities.

Of course they do. I have been following the use of capabilities
in Linux since before they were implemented. The uptake has been
disappointing in all use cases.

quoted

   This behavior is often abused to attack various kernel subsystems.

Yes. The problems of a single, all powerful root privilege scheme are
well documented.

quoted

   Only option

Hardly.

quoted

  is to disable them altogether which breaks a lot of
   userspace stuff.

Updating userspace components to behave properly in a capabilities
environment has never been a popular activity, but is the right way
to address this issue. And before you start on the "no one can do that,
it's too hard", I'll point out that multiple UNIX systems supported
rootless, all capabilities based systems back in the day.

quoted

   This goes against the least privilege principle.

If you're going to run userspace that *requires* privilege, you have
to have a way to *allow* privilege. If the userspace insists on a root
based privilege model, you're stuck supporting it. Regardless of your
principles.

Casey,

I might be wrong, but I think you're misreading this patchset.  It is not
about limiting capabilities in the init user ns at all.  It's about limiting
the capabilities which a process in a child userns can get.

Any unprivileged task can create a new userns, and get a process with
all capabilities in that namespace.  Always.  User namespaces were a
great success in that we can do this without any resulting privilege
against host owned resources.  The unaddressed issue is the expanded
kernel code surface area.

You say, above, (quoting out of place here)

quoted

Updating userspace components to behave properly in a capabilities
environment has never been a popular activity, but is the right way
to address this issue. And before you start on the "no one can do that,
it's too hard", I'll point out that multiple UNIX systems supported

He's not saying no one can do that.  He's saying, correctly, that the
kernel currently offers no way for userspace to do this limiting.  His
patchset offers two ways: one system wide capability mask (which applies
only to non-initial user namespaces) and on per-process inherited one
which - yay - userspace can use to limit what its children will be
able to get if they unshare a user namespace.

quoted

- It adds a new capability set.

Which is a really, really bad idea. The equation for calculating effective
privilege is already more complicated than userspace developers are generally
willing to put up with.

This is somewhat true, but I think the semantics of what is proposed here are
about as straightforward as you could hope for, and you can basically reason
about them completely independently of the other sets.  Only when reasoning
about the correctness of this code do you need to consider the other sets.  Not
when administering a system.

If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop
it from your pU.  Simple as that.

quoted

   This set dictates what capabilities are granted in namespaces (instead
   of always getting full caps).

I would not expect container developers to be eager to learn how to use
this facility.

I'm a container developer, and I'm excited about it :)

quoted

   This brings namespaces in line with the rest of the system, user
   namespaces are no more "special".

I'm sorry, but this makes no sense to me whatsoever. You want to introduce
a capability set explicitly for namespaces in order to make them less
special?

Yes, exactly.

quoted

Maybe I'm just old and cranky.

That's fine.

quoted

   They now work the same way as say a transition to root does with
   inheritable caps.

That needs some explanation.

quoted

- This isn't intended to be used by end users per se (although they could).
   This would be used at the same places where existing capabalities are
   used today (e.g. init system, pam, container runtime, browser
   sandbox), or by system administrators.

I understand that. It is for containers. Containers are not kernel entities.

User namespaces are.

This patch set provides userspace a way of limiting the kernel code exposed
to untrusted children, which currently does not exist.

theoretically, I am worried that in practice the existing utils allow
untrusted code to still access user namespaces.

In practice we have found that we need to allow a different set of capabilities
when bwrap is called from flatpak than when called on its own etc. We see the
same pattern with unshare and other utilities around launching applications
in user namespaces.

In practice at the distro level I don't see this approach actually helping.
Because we have so many uses that require exposing close to the full capabilities
set in multiple utilities that are required by many different applications.

To be clear this doesn't stop distros from doing something more, but is it
worth the added complexity if in practice it can't be used effectively.
I really don't have the answer.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help