Re: Fw: [PATCH] proc: Update inode upon changing task security attribute
From: Munehisa Kamata <hidden>
Date: 2023-12-08 02:14:53
Also in:
linux-fsdevel
On Tue, 2023-12-05 14:21:51 -0800, Paul Moore wrote:
On Fri, Dec 1, 2023 at 4:00 PM Munehisa Kamata [off-list ref] wrote:quoted
On Fri, 2023-12-01 09:30:00 +0000, Alexey Dobriyan wrote:quoted
On Wed, Nov 29, 2023 at 05:11:22PM -0800, Andrew Morton wrote:quoted
fyi... (yuk!) Begin forwarded message: Date: Thu, 30 Nov 2023 00:37:04 +0000 From: Munehisa Kamata <redacted> Subject: [PATCH] proc: Update inode upon changing task security attribute I'm not clear whether VFS is a better (or worse) place[1] to fix the problem described below and would like to hear opinion. If the /proc/[pid] directory is bind-mounted on a system with Smack enabled, and if the task updates its current security attribute, the task may lose access to files in its own /proc/[pid] through the mountpoint. $ sudo capsh --drop=cap_mac_override -- # mkdir -p dir # mount --bind /proc/$$ dir # echo AAA > /proc/$$/task/current # assuming built-in echo # cat /proc/$$/task/current # revalidate AAA # echo BBB > dir/attr/current # cat dir/attr/current cat: dir/attr/current: Permission denied # ls dir/ ls: cannot access dir/: Permission denied # cat /proc/$$/attr/current # revalidate BBB # cat dir/attr/current BBB # echo CCC > /proc/$$/attr/current # cat dir/attr/current cat: dir/attr/current: Permission denied This happens because path lookup doesn't revalidate the dentry of the /proc/[pid] when traversing the filesystem boundary, so the inode security blob of the /proc/[pid] doesn't get updated with the new task security attribute. Then, this may lead security modules to deny an access to the directory. Looking at the code[2] and the /proc/pid/attr/current entry in proc man page, seems like the same could happen with SELinux. Though, I didn't find relevant reports. The steps above are quite artificial. I actually encountered such an unexpected denial of access with an in-house application sandbox framework; each app has its own dedicated filesystem tree where the process's /proc/[pid] is bind-mounted to and the app enters into via chroot. With this patch, writing to /proc/[pid]/attr/current (and its per-security module variant) updates the inode security blob of /proc/[pid] or /proc/[pid]/task/[tid] (when pid != tid) with the new attribute. [1] https://lkml.kernel.org/linux-fsdevel/4A2D15AF.8090000@sun.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/selinux/hooks.c#n4220 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Munehisa Kamata <redacted> --- fs/proc/base.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-)diff --git a/fs/proc/base.c b/fs/proc/base.c index dd31e3b6bf77..bdb7bea53475 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c@@ -2741,6 +2741,7 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, { struct inode * inode = file_inode(file); struct task_struct *task; + const char *name = file->f_path.dentry->d_name.name; void *page; int rv;@@ -2784,10 +2785,26 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, if (rv < 0) goto out_free; - rv = security_setprocattr(PROC_I(inode)->op.lsm, - file->f_path.dentry->d_name.name, page, - count); + rv = security_setprocattr(PROC_I(inode)->op.lsm, name, page, count); mutex_unlock(¤t->signal->cred_guard_mutex); + + /* + * Update the inode security blob in advance if the task's security + * attribute was updated + */ + if (rv > 0 && !strcmp(name, "current")) { + struct pid *pid; + struct proc_inode *cur, *ei; + + rcu_read_lock(); + pid = get_task_pid(current, PIDTYPE_PID); + hlist_for_each_entry(cur, &pid->inodes, sibling_inodes) + ei = cur;Should this "break;"? Why is only the last inode in the list updated? Should it be the first? All of them?If it picks up the first node, it may end up updating /proc/[pid]/task/[tid] rather than /proc/[pid] (when pid == tid) and the task may be denied access to its own /proc/[pid] afterward. I think updating all of them won't hurt. But, as long as /proc/[pid] is accessible, the rest of the inodes should be updated upon path lookup via revalidation as usual. When pid != tid, it only updates /proc/[pid]/task/[tid] and the thread may lose an access to /proc/[pid], but I think it's okay as it's a matter of security policy enforced by security modules. Casey, do you have any comments here?quoted
quoted
+ put_pid(pid); + pid_update_inode(current, &ei->vfs_inode); + rcu_read_unlock(); + }I think my thoughts are neatly summarized by Andrew's "yuk!" comment at the top. However, before we go too much further on this, can we get clarification that Casey was able to reproduce this on a stock upstream kernel? Last I read in the other thread Casey wasn't seeing this problem on Linux v6.5. However, for the moment I'm going to assume this is a real problem, is there some reason why the existing pid_revalidate() code is not being called in the bind mount case? From what I can see in the original problem report, the path walk seems to work okay when the file is accessed directly from /proc, but fails when done on the bind mount. Is there some problem with revalidating dentrys on bind mounts?
Hi Paul, https://lkml.kernel.org/linux-fsdevel/20090608201745.GO8633@ZenIV.linux.org.uk/ After reading this thread, I have doubt about solving this in VFS. Honestly, however, I'm not sure if it's entirely relevant today. I think this behavior itself is not specific to bind mounts. Though, /proc/[pid] files are a bit special since its inode security blob has to be updated whenever the task's security attribute changed and it requries revalidation. I also considered .d_weak_revalidate, but it only helps when the final component of the path is the mountpoint on path lookup; a task will need to explicitly access the mountpoint once before accessing the files under the directory... I don't think it's a solution. Thanks, Munehisa
-- paul-moore.com