Re: [RFC v2 19/19] ima: Setup securityfs for IMA namespace

From: Christian Brauner <hidden>
Date: 2021-12-06 12:09:01
Also in: linux-security-module, lkml

On Fri, Dec 03, 2021 at 11:37:14AM -0800, Casey Schaufler wrote:

On 12/3/2021 10:50 AM, James Bottomley wrote:

quoted

On Fri, 2021-12-03 at 13:06 -0500, Stefan Berger wrote:

quoted

On 12/3/21 12:03, James Bottomley wrote:

quoted

On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote:
[...]

quoted

   static int securityfs_init_fs_context(struct fs_context *fc)
   {
+	int rc;
+
+	if (fc->user_ns->ima_ns->late_fs_init) {
+		rc = fc->user_ns->ima_ns->late_fs_init(fc->user_ns);
+		if (rc)
+			return rc;
+	}
   	fc->ops = &securityfs_context_ops;
   	return 0;
   }

I know I suggested this, but to get this to work in general, it's
going to have to not be specific to IMA, so it's going to have to
become something generic like a notifier chain.  The other problem
is it's only working still by accident:

I had thought about this also but the rationale was:

securityfs is compiled due to CONFIG_IMA_NS and the user namespace
exists there and that has a pointer now to ima_namespace, which can
have that callback. I assumed that other namespaced subsystems could
also be  reached then via such a callback, but I don't know.

Well securityfs is supposed to exist for LSMs.  At some point each of
those is going to need to be namespaced, which may eventually be quite
a pile of callbacks, which is why I thought of a notifier.

While AppArmor, lockdown and the integrity family use securityfs,
SELinux and Smack do not. They have their own independent filesystems.
Implementations of namespacing for each of SELinux and Smack have been
proposed, but nothing has been adopted. It would be really handy to
namespace the infrastructure rather than each individual LSM, but I
fear that's a bigger project than anyone will be taking on any time
soon. It's likely to encounter many of the same issues that I've been
dealing with for module stacking.

The main thing that bothers me is that it uses simple_pin_fs() and
simple_unpin_fs() which I would try hard to get rid of if possible. The
existence of this global pinning logic makes namespacing it properly
more difficult then it needs to be and it creates imho wonky semantics
where the last unmount doesn't really destroy the superblock. Instead
subsequents mounts resurface the same superblock. There might be an
inherent design reason why this needs to be this way but I would advise
against these semantics for anything that wants to be namespaced.
Probably the first securityfs mount in init_user_ns can follow these
semantics but ones tied to a non-initial user namespace should not as
the userns can go away. In that case the pinning logic seems strange as
conceptually the userns pins the securityfs mount as evidenced by the
fact that we key by it in get_tree_keyed().

quoted

I suppose any late filesystem init callchain would have to be
connected to the user_namespace somehow?

I don't think so; I think just moving some securityfs entries into the
user_namespace and managing the notifier chain from within securityfs
will do for now.  [although I'd have to spec this out in code before I
knew for sure].

quoted

+int ima_fs_ns_init(struct ima_namespace *ns)
+{
+	ns->mount = securityfs_ns_create_mount(ns->user_ns);

This actually triggers on the call to securityfs_init_fs_context,
but nothing happens because the callback is null.  Every subsequent
use of fscontext will trigger this.  The point of a keyed supeblock
is that fill_super is only called once per key, that's the place we
should be doing this.   It should also probably be a blocking
notifier so anyconsumer of securityfs can be namespaced by
registering for this notifier.

What I don't like about the fill_super is that it gets called too
early:

[   67.058611] securityfs_ns_create_mount @ 102 target user_ns:
ffff95c010698c80; nr_extents: 0
[   67.059836] securityfs_fill_super @ 47  user_ns:
ffff95c010698c80;
nr_extents: 0

Right, it's being activated by securityfs_ns_create_mount which is
called as soon as the user_ns is created.

quoted

We are switching to the target user namespace in
securityfs_ns_create_mount. The expected nr_extents at this point is
0, since user_ns hasn't been configured, yet. But then
security_fill_super is also called with nr_extents 0. We cannot use
that, it's too early!

Exactly, so I was thinking of not having a securityfs_ns_create_mount
at all.  All the securityfs_ns_create.. calls would be in the notifier
call chain. This means there's nothing to fill the superblock until an
actual mount on it is called.

quoted

+	if (IS_ERR(ns->mount)) {
+		ns->mount = NULL;
+		return -1;
+	}
+	ns->mount_count = 1;

This is a bit nasty, too: we're spilling the guts of mount count
tracking into IMA instead of encapsulating it inside securityfs.

Ok, I can make this disappear.

quoted

+
+	/* Adjust the trigger for user namespace's early teardown of
dependent
+	 * namespaces. Due to the filesystem there's an additional
reference
+	 * to the user namespace.
+	 */
+	ns->user_ns->refcount_teardown += 1;
+
+	ns->late_fs_init = ima_fs_ns_late_init;
+
+	return 0;
+}

I think what should be happening is that we shouldn't so the
simple_pin_fs, which creates the inodes, ahead of time; we should
do it inside fill_super using a notifier, meaning it gets called
once per

fill_super would only work for the init_user_ns from what I can see.

quoted

key, creates the root dentry then triggers the notifier which
instantiates all the namespaced entries.  We can still use
simple_pin_fs for this because there's no locking across
fill_super.
This would mean fill_super would be called the first time the
securityfs is mounted inside the namespace.

I guess I would need to know how fill_super would work or how it
could be called late/delayed as well.

So it would be called early in the init_user_ns by non-namespaced
consumers of securityfs, like it is now.

Namespaced consumers wouldn't call any securityfs_ns_create callbacks
to create dentries until they were notified from the fill_super
notifier, which would now only be triggered on first mount of
securityfs inside the namespace.

quoted

If we do it this way, we can now make securityfs have its own mount
and mount_count inside the user namespace, which it uses internally
to the securityfs code, thus avoiding exposing them to ima or any
other namespaced consumer.

I also think we now don't need the securityfs_ns_ duplicated
functions because the callback via the notifier chain now ensures
we can usethe namespace they were created in to distinguish between
non namespaced and namespaced entries.

Is there then no need to pass a separate vfsmount * in anymore?

I don't think so no.  It could be entirely managed internally to
securityfs.

quoted

Where would the vfsmount pointer reside? For now it's in
ima_namespace, but it sounds like it should be in a more centralized
place? Should it also be  connected to the user_namespace so we can
pick it up using get_user_ns()?

exactly.  I think struct user_namespace should have two elements gated
by a #ifdef CONFIG_SECURITYFS which are the vfsmount and the
mount_count for passing into simple_pin_fs.


James

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help