Re: [RFC PATCH v3 3/3] devguard: added device guard for mknod in non-initial userns
From: Christian Brauner <brauner@kernel.org>
Date: 2023-12-15 14:15:42
Also in:
bpf, linux-fsdevel, lkml
Subsystem:
filesystems (vfs and infrastructure), the rest · Maintainers:
Alexander Viro, Christian Brauner, Linus Torvalds
On Fri, Dec 15, 2023 at 02:26:53PM +0100, Michael Weiß wrote:
On 15.12.23 13:31, Christian Brauner wrote:quoted
On Wed, Dec 13, 2023 at 03:38:13PM +0100, Michael Weiß wrote:quoted
devguard is a simple LSM to allow CAP_MKNOD in non-initial user namespace in cooperation of an attached cgroup device program. We just need to implement the security_inode_mknod() hook for this. In the hook, we check if the current task is guarded by a device cgroup using the lately introduced cgroup_bpf_current_enabled() helper. If so, we strip out SB_I_NODEV from the super block. Access decisions to those device nodes are then guarded by existing device cgroups mechanism. Signed-off-by: Michael Weiß <redacted> ---I think you misunderstood me... My point was that I believe you don't need an additional LSM at all and no additional LSM hook. But I might be wrong. Only a POC would show.Yeah sorry, I got your point now.
I think I might have had a misconception about how this works. A bpf LSM program can't easily alter a kernel object such as struct super_block I've been told.
quoted
Just write a bpf lsm program that strips SB_I_NODEV in the existing security_sb_set_mnt_opts() call which is guranteed to be called when a new superblock is created.This does not work since SB_I_NODEV is a required_iflag in mount_too_revealing(). This I have already tested when writing the simple LSM here. So maybe we need to drop SB_I_NODEV from required_flags there, too. Would that be safe?
Right. I think we might be able to add a new SB_I_MANAGED_DEVICES flag. __UNTESTED, UNCOMPILED_
diff --git a/fs/namespace.c b/fs/namespace.c
index fbf0e596fcd3..e87cc0320091 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c@@ -4887,7 +4887,6 @@ static bool mnt_already_visible(struct mnt_namespace *ns, static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags) { - const unsigned long required_iflags = SB_I_NOEXEC | SB_I_NODEV; struct mnt_namespace *ns = current->nsproxy->mnt_ns; unsigned long s_iflags;
@@ -4899,9 +4898,13 @@ static bool mount_too_revealing(const struct super_block *sb, int *new_mnt_flags if (!(s_iflags & SB_I_USERNS_VISIBLE)) return false; - if ((s_iflags & required_iflags) != required_iflags) { - WARN_ONCE(1, "Expected s_iflags to contain 0x%lx\n", - required_iflags); + if (!(s_iflags & SB_I_NOEXEC)) { + WARN_ONCE(1, "Expected s_iflags to contain SB_I_NOEXEC\n"); + return true; + } + + if (!(s_iflags & (SB_I_NODEV | SB_I_MANAGED_DEVICES))) { + WARN_ONCE(1, "Expected s_iflags to contain device access mask\n"); return true; }
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 98b7a7a8c42e..6ca0fe922478 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h@@ -1164,6 +1164,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_I_USERNS_VISIBLE 0x00000010 /* fstype already mounted */ #define SB_I_IMA_UNVERIFIABLE_SIGNATURE 0x00000020 #define SB_I_UNTRUSTED_MOUNTER 0x00000040 +#define SB_I_MANAGED_DEVICES 0x00000080 #define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */ #define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */
quoted
Store your device access rules in a bpf map or in the sb->s_security blob (This is where I'm fuzzy and could use a bpf LSM expert's input.). Then make that bpf lsm program kick in everytime a security_inode_mknod() and security_file_open() is called and do device access management in there. Actually, you might need to add one hook when the actual device that's about to be opened is know. This should be where today the device access hooks are called. And then you should already be done with this. The only thing that you need is the capable check patch. You don't need that cgroup_bpf_current_enabled() per se. Device management could now be done per superblock, and not per task. IOW, you allowlist a bunch of devices that can be created and opened. Any task that passes basic permission checks and that passes the bpf lsm program may create device nodes. That's a way more natural device management model than making this a per cgroup thing. Though that could be implemented as well with this. I would try to write a bpf lsm program that does device access management with your capable() sysctl patch applied and see how far I get. I don't have the time otherwise I'd do it.I'll give it a try but no promises how fast this will go.
No worries. We're entering the holiday season anyway.