Re: [RFC 20/20] ima: Setup securityfs_ns for IMA namespace
From: Stefan Berger <stefanb@linux.ibm.com>
Date: 2021-12-01 21:35:16
Also in:
linux-integrity, linux-security-module
On 12/1/21 16:11, James Bottomley wrote:
On Wed, 2021-12-01 at 15:25 -0500, Stefan Berger wrote:quoted
On 12/1/21 14:21, James Bottomley wrote:quoted
On Wed, 2021-12-01 at 13:11 -0500, Stefan Berger wrote:quoted
On 12/1/21 12:56, James Bottomley wrote:[...]quoted
I tried this with runc and a user namespace active mapping uid 1000 on the host to uid 0 in the container. There I run into the problem that all of the files and directories without the above work-around are mapped to 'nobody', just like all the files in sysfs in this case are also mapped to nobody. This code resolved the issue.So I applied your patches with the permission shift commented out and instrumented inode_alloc() to see where it might be failing and I actually find it all works as expected for me: ejb@testdeb:~> unshare -r --user --mount --ima root@testdeb:~# mount -t securityfs_ns none /sys/kernel/security root@testdeb:~# ls -l /sys/kernel/security/ima/ total 0 -r--r----- 1 root root 0 Dec 1 19:11 ascii_runtime_measurements -r--r----- 1 root root 0 Dec 1 19:11 binary_runtime_measurements -rw------- 1 root root 0 Dec 1 19:11 policy -r--r----- 1 root root 0 Dec 1 19:11 runtime_measurements_count -r--r----- 1 root root 0 Dec 1 19:11 violations I think your problem is something to do with how runc is installing the uid/gid mappings. If it's installing them after the security_ns inodes are created then they get the -1 value (because no mappings exist in s_user_ns). I can even demonstrate this by forcing unshare to enter the IMA namespace before writing the mapping values and I'll see "nobody nogroup" above like you do.I am surprised you get this mapping even after commenting the permission adjustments... it doesn't work for me when I comment them out: [stefanb@ima-ns-dev rootfs]$ unshare -r --user --mount [root@ima-ns-dev rootfs]# mount -t securityfs_ns none /sys/kernel/security/ [root@ima-ns-dev rootfs]# cd /sys/kernel/security/ima/ [root@ima-ns-dev ima]# ls -l total 0 -r--r-----. 1 nobody nobody 0 Dec 1 15:20 ascii_runtime_measurements -r--r-----. 1 nobody nobody 0 Dec 1 15:20 binary_runtime_measurements -rw-------. 1 nobody nobody 0 Dec 1 15:20 policy -r--r-----. 1 nobody nobody 0 Dec 1 15:20 runtime_measurements_count -r--r-----. 1 nobody nobody 0 Dec 1 15:20 violations [root@ima-ns-dev ima]# cat /proc/self/uid_map 0 1000 1 [root@ima-ns-dev ima]# cat /proc/self/gid_map 0 1000 1 The initialization of securityfs and setup of files and directories happens at the same time as the IMA namespace is created. At this time there are no user mappings available, so that's why I need to make the adjustments 'late'.There is one other possible difference: To get the correct s_user_ns
I am currently wondering why I cannot re-create your setup while disabling the remapping...
on the securityfs_ns mount, the mount namespace itself has to be owned by the user namespace ... is runc doing that correctly? I always
Following an strace of 'runc create' I see an unshare(CLONE_NEWUSER) by a process before it does an unshare(CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWPID|CLONE_NEWNET), so this seems to be doing it in the order you suggest. Also, runc seems to have its own set of struggles. I am not sure we would be able to ask them to accommodate us to do it 'correctly' - it doesn't sound so 'easy' for them either to get everything under the hood: https://github.com/opencontainers/runc/blob/master/libcontainer/nsenter/nsexec.c#L919 * In order for this unsharing code to be more extensible we need to split * up unshare(CLONE_NEWUSER) and clone() in various ways. The ideal case * would be if we did clone(CLONE_NEWUSER) and the other namespaces * separately, but because of SELinux issues we cannot really do that. But [...] * However, if we unshare(2) the user namespace *before* we clone(2), then * all hell breaks loose. sounds like fun So, I am not quite sure whether I am working around an issue of runc but for that I would like to first be able to re-create your successful setup to see what's different. Stefan
forget this detail because unshare does it correctly automatically but it means you must unshare the user namespace first and then unshare the mount namespace (or do it in the same sys call because the kernel will get the correct order). James