Re: [RFC PATCH v3 3/3] devguard: added device guard for mknod in non-initial userns

From: Christian Brauner <brauner@kernel.org>
Date: 2023-12-15 14:15:42
Also in: bpf, linux-fsdevel, lkml
Subsystem: filesystems (vfs and infrastructure), the rest · Maintainers: Alexander Viro, Christian Brauner, Linus Torvalds

On Fri, Dec 15, 2023 at 02:26:53PM +0100, Michael Weiß wrote:

On 15.12.23 13:31, Christian Brauner wrote:

quoted

On Wed, Dec 13, 2023 at 03:38:13PM +0100, Michael Weiß wrote:

quoted

devguard is a simple LSM to allow CAP_MKNOD in non-initial user
namespace in cooperation of an attached cgroup device program. We
just need to implement the security_inode_mknod() hook for this.
In the hook, we check if the current task is guarded by a device
cgroup using the lately introduced cgroup_bpf_current_enabled()
helper. If so, we strip out SB_I_NODEV from the super block.

Access decisions to those device nodes are then guarded by existing
device cgroups mechanism.

Signed-off-by: Michael Weiß <redacted>
---

I think you misunderstood me... My point was that I believe you don't
need an additional LSM at all and no additional LSM hook. But I might be
wrong. Only a POC would show.

Yeah sorry, I got your point now.

I think I might have had a misconception about how this works.
A bpf LSM program can't easily alter a kernel object such as struct
super_block I've been told.

quoted

Just write a bpf lsm program that strips SB_I_NODEV in the existing
security_sb_set_mnt_opts() call which is guranteed to be called when a
new superblock is created.

This does not work since SB_I_NODEV is a required_iflag in
mount_too_revealing(). This I have already tested when writing the
simple LSM here. So maybe we need to drop SB_I_NODEV from required_flags
there, too. Would that be safe?

Right. I think we might be able to add a new SB_I_MANAGED_DEVICES flag.
__UNTESTED, UNCOMPILED_

diff --git a/fs/namespace.c b/fs/namespace.c
index fbf0e596fcd3..e87cc0320091 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c

@@ -4887,7 +4887,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns,

 static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags)
 {
-       const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV;
        struct mnt_namespace *ns = current->nsproxy->mnt_ns;
        unsigned long s_iflags;

@@ -4899,9 +4898,13 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags
        if (!(s_iflags & SB_I_USERNS_VISIBLE))
                return false;

-       if ((s_iflags & required_iflags) != required_iflags) {
-               WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n",
-                         required_iflags);
+       if (!(s_iflags & SB_I_NOEXEC)) {
+               WARN_ONCE(1, "Expected s_iflags to contain SB_I_NOEXEC\n");
+               return true;
+       }
+
+       if (!(s_iflags & (SB_I_NODEV | SB_I_MANAGED_DEVICES))) {
+               WARN_ONCE(1, "Expected s_iflags to contain device access mask\n");
                return true;
        }

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..6ca0fe922478 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h

@@ -1164,6 +1164,7 @@ extern int send_sigurg(struct fown_struct *fown);
 #define SB_I_USERNS_VISIBLE            0x00000010 /* fstype already mounted */
 #define SB_I_IMA_UNVERIFIABLE_SIGNATURE        0x00000020
 #define SB_I_UNTRUSTED_MOUNTER         0x00000040
+#define SB_I_MANAGED_DEVICES           0x00000080

 #define SB_I_SKIP_SYNC 0x00000100      /* Skip superblock at global sync */
 #define SB_I_PERSB_BDI 0x00000200      /* has a per-sb bdi */

quoted

Store your device access rules in a bpf map or in the sb->s_security
blob (This is where I'm fuzzy and could use a bpf LSM expert's input.).

Then make that bpf lsm program kick in everytime a
security_inode_mknod() and security_file_open() is called and do device
access management in there. Actually, you might need to add one hook
when the actual device that's about to be opened is know. 
This should be where today the device access hooks are called.

And then you should already be done with this. The only thing that you
need is the capable check patch.

You don't need that cgroup_bpf_current_enabled() per se. Device
management could now be done per superblock, and not per task. IOW, you
allowlist a bunch of devices that can be created and opened. Any task
that passes basic permission checks and that passes the bpf lsm program
may create device nodes.

That's a way more natural device management model than making this a per
cgroup thing. Though that could be implemented as well with this.

I would try to write a bpf lsm program that does device access
management with your capable() sysctl patch applied and see how far I
get.

I don't have the time otherwise I'd do it.

I'll give it a try but no promises how fast this will go.

No worries. We're entering the holiday season anyway.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help