Re: [PATCH v5 13/16] ima: Move some IMA policy and filesystem related variables into ima_namespace
From: Christian Brauner <hidden>
Date: 2021-12-11 10:45:43
Also in:
linux-security-module, lkml
Subsystem:
security subsystem, the rest · Maintainers:
Paul Moore, James Morris, "Serge E. Hallyn", Linus Torvalds
On Sat, Dec 11, 2021 at 10:50:26AM +0100, Christian Brauner wrote:
quoted hunk ↗ jump to hunk
On Fri, Dec 10, 2021 at 08:57:11AM -0500, Stefan Berger wrote:quoted
On 12/10/21 06:32, Christian Brauner wrote:quoted
On Thu, Dec 09, 2021 at 07:57:02PM -0500, Stefan Berger wrote:quoted
On 12/9/21 14:11, Christian Brauner wrote:quoted
From 1f03dc427c583d5e9ebc9ebe9de77c3c535bbebe Mon Sep 17 00:00:00 2001 From: Christian Brauner <redacted> Date: Thu, 9 Dec 2021 20:07:02 +0100 Subject: [PATCH] !!!! HERE BE DRAGONS - UNTESTED !!!! --- security/integrity/ima/ima_fs.c | 43 +++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 5 deletions(-)diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c index 583462b29cb5..d5b302b925b8 100644 --- a/security/integrity/ima/ima_fs.c +++ b/security/integrity/ima/ima_fs.c@@ -317,10 +317,14 @@ static ssize_t ima_read_policy(char *path) static ssize_t ima_write_policy(struct file *file, const char __user *buf, size_t datalen, loff_t *ppos) { - struct ima_namespace *ns = get_current_ns(); + struct ima_namespace *ns; + struct user_namespace *user_ns; char *data; ssize_t result; + user_ns = ima_filp_private(filp); + ns = user_ns->ima_ns + if (datalen >= PAGE_SIZE) datalen = PAGE_SIZE - 1;@@ -373,26 +377,51 @@ static const struct seq_operations ima_policy_seqops = { }; #endif +static struct user_namespace *ima_filp_private(struct file *filp) +{ + if (!(filp->f_flags & O_WRONLY)) { +#ifdef CONFIG_IMA_READ_POLICY + struct seq_file *seq; + + seq = filp->private_data; + return seq->private; +#endif + } + return filp->private_data; +} + /* * ima_open_policy: sequentialize access to the policy file */ static int ima_open_policy(struct inode *inode, struct file *filp) { - struct ima_namespace *ns = get_current_ns(); + struct user_namespace *user_ns = current_user_ns();Do we have to take a reference on the user namespace assuming one can open the file, pass the fd down the hierarchy, and then the user namespace with the opened file goes away? Or is there anything else that keeps the user namespace alive?No, we don't. When ima_policy_open() is called we do current_user_ns() but that will be guaranteed to be identical to filp->f_cred->user_ns. And f_cred is a reference that has been taken when the vfs allocated a struct file for this .open call so won't go away until the last fput. My proposal is also too complicated, I think. (The booster is giving me the same side-effects as my second shot so this looks like two good days of fever and headache. So I'll use that as an excuse. :)) Your patch series as it stands has a bit of a security issue with those get_current_ns() calls across differnet file/seq_file operations. You have to make an architectural decision, I think. I see two sensible options: 1. The relevant ima_ns that .open/.read/.write operate on is always taken to be the ima_ns of the filesystem's userns, i.e. sb->s_user_ns->ima_ns. This - but I'm not an ima person - makes the most sense to me and the semantics are straightforward. If I write to a file to alter some policy then I expect the ima namespace of the user namespace to be affected that the securityfs instance was mounted in. 2. The relevant ima_ns that .open/.read/.write operate on is always taken to be the one of the opener. I don't really like that as that gets weird if for some complicated reason the caller is not located in the userns the filesystem was mounted in (weird mount propagation scenario or sm). It also feels strange to operate on an ima_ns that's different from s_user_ns->ima_ns in a securityfs instance.We have this situation because one can setns() to another mount namespaces but the data shown by SecurityFS lives in a user namespace, right? And now we need to decide whether to affect the data in the user namespace that did the open (option 2) or to which the SecurityFS belongs to (option 1). If we were to open a regular file it would be option 1, so we should probably not break that existing semantic and also choose option 1 unless there one wasn't allowed to choose the user namespace the SecurityFS files belonged to then it should be option 2 but then we have file descriptor passing where 'being allowed' can change depending on who is reading/writing a file... IsA general remark that's always worth repeating: under no circumstances should the object that you're operating on change from .open to .read/.write. Consider the following very rough sketch: pid = fork(): if (pid == 0) { // create USERNS1 unshare(CLONE_NEWUSER); // write an idmapping for getuid() to userns 0 mount("", "/sys/kernel/security", "securityfs", [...]); int fd_ima = open("/sys/kernel/security/some_ima_file", O_WRONLY); // Send fd_ima to parent via SCM_RIGHTS } // Receive fd_ima from child. write(fd_ima, "SET_OPTION_ON_INIT_USER_NS", ...); That example above is an instant security issue if the ima_ns changed depending on the caller since it means you could open a file in an unprivileged userns and then funnel it to a random process in init_user_ns that you control via SCM_RIGHTS and have it do the write() and instead of USERNS1->ima_ns you'd suddenly have changed init_user_ns->ima_ns. And this is what your previous patch with these multiple get_current_ns() calls allowed. So the write in the example above must operate on USERNS1->ima_ns. Now, both option 1 and 2 will ensure that the ima_ns that is operated on is always the same across .open/.read/.write: 1. securityfs->sb->s_user_ns->ima_ns this is trivially guaranteed because the filesystem's user namespace doesn't change no matter where you access that filesystem from. 2. securityfs_file->f_cred->user_ns->ima_ns this is guaranteed because f_cred stashes the openers credentials. What I fundamentally dislike about option 2 (f_cred) is - and I tried to say that in my prior mail - that what ima_ns is operated on depends on the userns of the process that called .open, i.e. the contents in securityfs change depending on the opener's userns. And I think that is fundamentally sloppy semantics. The contents of a filesystem should generally not depend on the caller if it can be avoided. We have a few instances where this is the case (e.g. some procfs sysctls) but especially for securityfs this does not make sense.quoted
there anything that would prevent us from setns()'ing to that target user namespace so that we would now see that of a user namespace that we are not allowed to see?If you're really worried about someone being able to access a securityfs instance whose userns doesn't match the userns the securityfs instance was mounted in there are multiple ways to fix it. The one that I tend to prefer is: From e0ff6a8dcc573763568e685dd70d1547efd68df9 Mon Sep 17 00:00:00 2001 From: Christian Brauner <redacted> Date: Fri, 10 Dec 2021 11:47:37 +0100 Subject: !!!! HERE BE DRAGONS - COMPLETELY UNTESTED !!!! securityfs: only allow access to securityfs from within same namespace Limit opening of securityfs files to callers located in the same namespace. --- security/inode.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-)diff --git a/security/inode.c b/security/inode.c index eaccba7017d9..9eaf757c08cb 100644 --- a/security/inode.c +++ b/security/inode.c@@ -80,6 +80,35 @@ static struct file_system_type fs_type = { .fs_flags = FS_USERNS_MOUNT, }; +static int securityfs_permission(struct user_namespace *mnt_userns, + struct inode *inode, int mask) +{ + int err; + + err = generic_permission(&init_user_ns, inode, mask); + if (!err) { + if (inode->i_sb->s_user_ns != current_user_ns()) + err = -EACCES; + }
Thinking about this a better alternative might be (on top of my previous suggestion):
diff --git a/security/inode.c b/security/inode.c
index 0000b2bd4c0c..a540372e4c8c 100644
--- a/security/inode.c
+++ b/security/inode.c@@ -86,15 +86,15 @@ static struct file_system_type fs_type = { static int securityfs_permission(struct user_namespace *mnt_userns, struct inode *inode, int mask) { - int err; - - err = generic_permission(&init_user_ns, inode, mask); - if (!err) { - if (inode->i_sb->s_user_ns != current_user_ns()) - err = -EACCES; - } - - return err; + /* + * Restrict access to securityfs to callers whose user + * namespace is the same or an ancestor of the user namespace + * this securityfs instance was mounted in. + */ + if (!in_userns(current_user_ns(), inode->i_sb->s_user_ns)) + return -EACCES; + + return generic_permission(&init_user_ns, inode, mask); } const struct inode_operations securityfs_dir_inode_operations = {