Thread (68 messages) 68 messages, 7 authors, 2021-12-13

Re: [PATCH v5 13/16] ima: Move some IMA policy and filesystem related variables into ima_namespace

From: Christian Brauner <hidden>
Date: 2021-12-11 10:45:43
Also in: linux-security-module, lkml
Subsystem: security subsystem, the rest · Maintainers: Paul Moore, James Morris, "Serge E. Hallyn", Linus Torvalds

On Sat, Dec 11, 2021 at 10:50:26AM +0100, Christian Brauner wrote:
quoted hunk ↗ jump to hunk
On Fri, Dec 10, 2021 at 08:57:11AM -0500, Stefan Berger wrote:
quoted
On 12/10/21 06:32, Christian Brauner wrote:
quoted
On Thu, Dec 09, 2021 at 07:57:02PM -0500, Stefan Berger wrote:
quoted
On 12/9/21 14:11, Christian Brauner wrote:
quoted
  From 1f03dc427c583d5e9ebc9ebe9de77c3c535bbebe Mon Sep 17 00:00:00 2001
From: Christian Brauner <redacted>
Date: Thu, 9 Dec 2021 20:07:02 +0100
Subject: [PATCH] !!!! HERE BE DRAGONS - UNTESTED !!!!

---
   security/integrity/ima/ima_fs.c | 43 +++++++++++++++++++++++++++++----
   1 file changed, 38 insertions(+), 5 deletions(-)
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index 583462b29cb5..d5b302b925b8 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -317,10 +317,14 @@ static ssize_t ima_read_policy(char *path)
   static ssize_t ima_write_policy(struct file *file, const char __user *buf,
   				size_t datalen, loff_t *ppos)
   {
-	struct ima_namespace *ns = get_current_ns();
+	struct ima_namespace *ns;
+	struct user_namespace *user_ns;
   	char *data;
   	ssize_t result;
+	user_ns = ima_filp_private(filp);
+	ns = user_ns->ima_ns
+
   	if (datalen >= PAGE_SIZE)
   		datalen = PAGE_SIZE - 1;
@@ -373,26 +377,51 @@ static const struct seq_operations ima_policy_seqops = {
   };
   #endif
+static struct user_namespace *ima_filp_private(struct file *filp)
+{
+	if (!(filp->f_flags & O_WRONLY)) {
+#ifdef CONFIG_IMA_READ_POLICY
+		struct seq_file *seq;
+
+		seq = filp->private_data;
+		return seq->private;
+#endif
+	}
+	return filp->private_data;
+}
+
   /*
    * ima_open_policy: sequentialize access to the policy file
    */
   static int ima_open_policy(struct inode *inode, struct file *filp)
   {
-	struct ima_namespace *ns = get_current_ns();
+	struct user_namespace *user_ns = current_user_ns();
Do we have to take a reference on the user namespace assuming one can open
the file, pass the fd down the hierarchy, and then the user namespace with
the opened file goes away? Or is there anything else that keeps the user
namespace alive?
No, we don't. When ima_policy_open() is called we do current_user_ns()
but that will be guaranteed to be identical to filp->f_cred->user_ns.
And f_cred is a reference that has been taken when the vfs allocated a
struct file for this .open call so won't go away until the last fput.

My proposal is also too complicated, I think.
(The booster is giving me the same side-effects as my second shot so
this looks like two good days of fever and headache. So I'll use that as
an excuse. :))

Your patch series as it stands has a bit of a security issue with those
get_current_ns() calls across differnet file/seq_file operations.

You have to make an architectural decision, I think. I see two sensible
options:
1. The relevant ima_ns that .open/.read/.write operate on is always taken
    to be the ima_ns of the filesystem's userns, i.e.
    sb->s_user_ns->ima_ns.
    This - but I'm not an ima person - makes the most sense to me and the
    semantics are straightforward. If I write to a file to alter some
    policy then I expect the ima namespace of the user namespace to be
    affected that the securityfs instance was mounted in.
2. The relevant ima_ns that .open/.read/.write operate on is always
    taken to be the one of the opener. I don't really like that as that
    gets weird if for some complicated reason the caller is not located
    in the userns the filesystem was mounted in (weird mount propagation
    scenario or sm). It also feels strange to operate on an ima_ns that's
    different from s_user_ns->ima_ns in a securityfs instance.
We have this situation because one can setns() to another mount namespaces
but the data shown by SecurityFS lives in a user namespace, right? And now
we need to decide whether to affect the data in the user namespace  that did
the open (option 2) or to which the SecurityFS belongs to (option 1). If we
were to open a regular file it would be option 1, so we should probably not
break that existing semantic and also choose option 1 unless there one
wasn't allowed to choose the user namespace the SecurityFS files belonged to
then it should be option 2 but then we have file descriptor passing where
'being allowed' can change depending on who is reading/writing a file... Is
A general remark that's always worth repeating: under no circumstances
should the object that you're operating on change from .open to
.read/.write. Consider the following very rough sketch:

	pid = fork():
	if (pid == 0) {
		// create USERNS1
		unshare(CLONE_NEWUSER);
	
		// write an idmapping for getuid() to userns 0
	
		mount("", "/sys/kernel/security", "securityfs", [...]);
	
		int fd_ima = open("/sys/kernel/security/some_ima_file", O_WRONLY);
	
		// Send fd_ima to parent via SCM_RIGHTS
	}
	// Receive fd_ima from child.
	
	write(fd_ima, "SET_OPTION_ON_INIT_USER_NS", ...); 

That example above is an instant security issue if the ima_ns changed
depending on the caller since it means you could open a file in an
unprivileged userns and then funnel it to a random process in
init_user_ns that you control via SCM_RIGHTS and have it do the write()
and instead of USERNS1->ima_ns you'd suddenly have changed
init_user_ns->ima_ns.
And this is what your previous patch with these multiple
get_current_ns() calls allowed.

So the write in the example above must operate on USERNS1->ima_ns.

Now, both option 1 and 2 will ensure that the ima_ns that is operated on
is always the same across .open/.read/.write:
1. securityfs->sb->s_user_ns->ima_ns this is trivially guaranteed
   because the filesystem's user namespace doesn't change no matter
   where you access that filesystem from.
2. securityfs_file->f_cred->user_ns->ima_ns this is guaranteed because
   f_cred stashes the openers credentials.

What I fundamentally dislike about option 2 (f_cred) is - and I tried to
say that in my prior mail - that what ima_ns is operated on depends on
the userns of the process that called .open, i.e. the contents in
securityfs change depending on the opener's userns.

And I think that is fundamentally sloppy semantics. The contents of a
filesystem should generally not depend on the caller if it can be
avoided. We have a few instances where this is the case (e.g. some
procfs sysctls) but especially for securityfs this does not make sense.
quoted
there anything that would prevent us from setns()'ing to that target user
namespace so that we would now see that of a user namespace that we are not
allowed to see?
If you're really worried about someone being able to access a securityfs
instance whose userns doesn't match the userns the securityfs instance
was mounted in there are multiple ways to fix it. The one that I tend to
prefer is:

From e0ff6a8dcc573763568e685dd70d1547efd68df9 Mon Sep 17 00:00:00 2001
From: Christian Brauner <redacted>
Date: Fri, 10 Dec 2021 11:47:37 +0100
Subject: !!!! HERE BE DRAGONS - COMPLETELY UNTESTED !!!!

securityfs: only allow access to securityfs from within same namespace

Limit opening of securityfs files to callers located in the same namespace.

---
 security/inode.c | 33 +++++++++++++++++++++++++++++++--
 1 file changed, 31 insertions(+), 2 deletions(-)
diff --git a/security/inode.c b/security/inode.c
index eaccba7017d9..9eaf757c08cb 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -80,6 +80,35 @@ static struct file_system_type fs_type = {
 	.fs_flags =	FS_USERNS_MOUNT,
 };
 
+static int securityfs_permission(struct user_namespace *mnt_userns,
+				 struct inode *inode, int mask)
+{
+	int err;
+
+	err = generic_permission(&init_user_ns, inode, mask);
+	if (!err) {
+		if (inode->i_sb->s_user_ns != current_user_ns())
+			err = -EACCES;
+	}
Thinking about this a better alternative might be (on top of my
previous suggestion):
diff --git a/security/inode.c b/security/inode.c
index 0000b2bd4c0c..a540372e4c8c 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -86,15 +86,15 @@ static struct file_system_type fs_type = {
 static int securityfs_permission(struct user_namespace *mnt_userns,
                                 struct inode *inode, int mask)
 {
-       int err;
-
-       err = generic_permission(&init_user_ns, inode, mask);
-       if (!err) {
-               if (inode->i_sb->s_user_ns != current_user_ns())
-                       err = -EACCES;
-       }
-
-       return err;
+       /*
+        * Restrict access to securityfs to callers whose user
+        * namespace is the same or an ancestor of the user namespace
+        * this securityfs instance was mounted in.
+        */
+       if (!in_userns(current_user_ns(), inode->i_sb->s_user_ns))
+               return -EACCES;
+
+       return generic_permission(&init_user_ns, inode, mask);
 }

 const struct inode_operations securityfs_dir_inode_operations = {
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help