Thread (21 messages) 21 messages, 5 authors, 2023-12-11

Re: Fw: [PATCH] proc: Update inode upon changing task security attribute

From: Munehisa Kamata <hidden>
Date: 2023-12-08 02:14:53
Also in: linux-fsdevel

On Tue, 2023-12-05 14:21:51 -0800, Paul Moore wrote:
On Fri, Dec 1, 2023 at 4:00 PM Munehisa Kamata [off-list ref] wrote:
quoted
On Fri, 2023-12-01 09:30:00 +0000, Alexey Dobriyan wrote:
quoted
On Wed, Nov 29, 2023 at 05:11:22PM -0800, Andrew Morton wrote:
quoted
fyi...

(yuk!)

Begin forwarded message:
Date: Thu, 30 Nov 2023 00:37:04 +0000
From: Munehisa Kamata <redacted>
Subject: [PATCH] proc: Update inode upon changing task security attribute

I'm not clear whether VFS is a better (or worse) place[1] to fix the
problem described below and would like to hear opinion.

If the /proc/[pid] directory is bind-mounted on a system with Smack
enabled, and if the task updates its current security attribute, the task
may lose access to files in its own /proc/[pid] through the mountpoint.

 $ sudo capsh --drop=cap_mac_override --
 # mkdir -p dir
 # mount --bind /proc/$$ dir
 # echo AAA > /proc/$$/task/current         # assuming built-in echo
 # cat /proc/$$/task/current                        # revalidate
 AAA
 # echo BBB > dir/attr/current
 # cat dir/attr/current
 cat: dir/attr/current: Permission denied
 # ls dir/
 ls: cannot access dir/: Permission denied
 # cat /proc/$$/attr/current                        # revalidate
 BBB
 # cat dir/attr/current
 BBB
 # echo CCC > /proc/$$/attr/current
 # cat dir/attr/current
 cat: dir/attr/current: Permission denied

This happens because path lookup doesn't revalidate the dentry of the
/proc/[pid] when traversing the filesystem boundary, so the inode security
blob of the /proc/[pid] doesn't get updated with the new task security
attribute. Then, this may lead security modules to deny an access to the
directory. Looking at the code[2] and the /proc/pid/attr/current entry in
proc man page, seems like the same could happen with SELinux. Though, I
didn't find relevant reports.

The steps above are quite artificial. I actually encountered such an
unexpected denial of access with an in-house application sandbox
framework; each app has its own dedicated filesystem tree where the
process's /proc/[pid] is bind-mounted to and the app enters into via
chroot.

With this patch, writing to /proc/[pid]/attr/current (and its per-security
module variant) updates the inode security blob of /proc/[pid] or
/proc/[pid]/task/[tid] (when pid != tid) with the new attribute.

[1] https://lkml.kernel.org/linux-fsdevel/4A2D15AF.8090000@sun.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/selinux/hooks.c#n4220

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Munehisa Kamata <redacted>
---
 fs/proc/base.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index dd31e3b6bf77..bdb7bea53475 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2741,6 +2741,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
 {
    struct inode * inode = file_inode(file);
    struct task_struct *task;
+   const char *name = file->f_path.dentry->d_name.name;
    void *page;
    int rv;
@@ -2784,10 +2785,26 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
    if (rv < 0)
            goto out_free;

-   rv = security_setprocattr(PROC_I(inode)->op.lsm,
-                             file->f_path.dentry->d_name.name, page,
-                             count);
+   rv = security_setprocattr(PROC_I(inode)->op.lsm, name, page, count);
    mutex_unlock(&current->signal->cred_guard_mutex);
+
+   /*
+    *  Update the inode security blob in advance if the task's security
+    *  attribute was updated
+    */
+   if (rv > 0 && !strcmp(name, "current")) {
+           struct pid *pid;
+           struct proc_inode *cur, *ei;
+
+           rcu_read_lock();
+           pid = get_task_pid(current, PIDTYPE_PID);
+           hlist_for_each_entry(cur, &pid->inodes, sibling_inodes)
+                   ei = cur;
Should this "break;"? Why is only the last inode in the list updated?
Should it be the first? All of them?
If it picks up the first node, it may end up updating /proc/[pid]/task/[tid]
rather than /proc/[pid] (when pid == tid) and the task may be denied access
to its own /proc/[pid] afterward.

I think updating all of them won't hurt. But, as long as /proc/[pid] is
accessible, the rest of the inodes should be updated upon path lookup via
revalidation as usual.

When pid != tid, it only updates /proc/[pid]/task/[tid] and the thread may
lose an access to /proc/[pid], but I think it's okay as it's a matter of
security policy enforced by security modules. Casey, do you have any
comments here?
quoted
quoted
+           put_pid(pid);
+           pid_update_inode(current, &ei->vfs_inode);
+           rcu_read_unlock();
+   }
I think my thoughts are neatly summarized by Andrew's "yuk!" comment
at the top.  However, before we go too much further on this, can we
get clarification that Casey was able to reproduce this on a stock
upstream kernel?  Last I read in the other thread Casey wasn't seeing
this problem on Linux v6.5.

However, for the moment I'm going to assume this is a real problem, is
there some reason why the existing pid_revalidate() code is not being
called in the bind mount case?  From what I can see in the original
problem report, the path walk seems to work okay when the file is
accessed directly from /proc, but fails when done on the bind mount.
Is there some problem with revalidating dentrys on bind mounts?
Hi Paul,

https://lkml.kernel.org/linux-fsdevel/20090608201745.GO8633@ZenIV.linux.org.uk/

After reading this thread, I have doubt about solving this in VFS.
Honestly, however, I'm not sure if it's entirely relevant today.

I think this behavior itself is not specific to bind mounts. Though,
/proc/[pid] files are a bit special since its inode security blob has to be
updated whenever the task's security attribute changed and it requries
revalidation.

I also considered .d_weak_revalidate, but it only helps when the final
component of the path is the mountpoint on path lookup; a task will need to
explicitly access the mountpoint once before accessing the files under the
directory... I don't think it's a solution.


Thanks,
Munehisa
-- 
paul-moore.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help