Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
From: Serge E. Hallyn <hidden>
Date: 2017-11-09 03:21:38
Also in:
linux-api, lkml
On Thu, Nov 09, 2017 at 09:55:41AM +0900, Mahesh Bandewar (महेश बंडेवार) wrote:
On Thu, Nov 9, 2017 at 4:02 AM, Christian Brauner [off-list ref] wrote:quoted
On Wed, Nov 08, 2017 at 03:09:59AM -0800, Mahesh Bandewar (महेश बंडेवार) wrote:quoted
Sorry folks I was traveling and seems like lot happened on this thread. :p I will try to response few of these comments selectively -quoted
The thing that makes me hesitate with this set is that it is a permanent new feature to address what (I hope) is a temporary problem.I agree this is permanent new feature but it's not solving a temporary problem. It's impossible to assess what and when new vulnerability that could show up. I think Daniel summed it up appropriately in his responsequoted
Seems like there are two naive ways to do it, the first being to just look at all code under ns_capable() plus code called from there. It seems like looking at the result of that could be fruitful.This is really hard. The main issue that there were features designed and developed before user-ns days with an assumption that unprivileged users will never get certain capabilities which only root user gets. Now that is not true anymore with user-ns creation with mapping root for any process. Also at the same time blocking user-ns creation for eveyone is a big-hammer which is not needed too. So it's not that easy to just perform a code-walk-though and correct those decisions now.quoted
It seems to me that the existing control in /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape in that case.This solution is essentially blocking unprivileged users from using the user-namespaces entirely. This is not really a solution that can work. The solution that this patch-set adds allows unprivileged users to create user-namespaces. Actually the proposed solution is more fine-grained approach than the unprivileged_userns_clone solution since you can selectively block capabilities rather than completely blocking the functionality.I've been talking to Stéphane today about this and we should also keep in mind that we have: chb@conventiont|~quoted
ls -al /proc/sys/user/total 0 dr-xr-xr-x 1 root root 0 Nov 6 23:32 . dr-xr-xr-x 1 root root 0 Nov 2 22:13 .. -rw-r--r-- 1 root root 0 Nov 8 19:48 max_cgroup_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_instances -rw-r--r-- 1 root root 0 Nov 8 19:48 max_inotify_watches -rw-r--r-- 1 root root 0 Nov 8 19:48 max_ipc_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_mnt_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_net_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_pid_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_user_namespaces -rw-r--r-- 1 root root 0 Nov 8 19:48 max_uts_namespaces These files allow you to limit the number of namespaces that can be created *per namespace* type. So let's say your system runs a bunch of user namespaces you can do: chb@conventiont|~quoted
echo 0 > /proc/sys/user/max_user_namespacesSo that the next time you try to create a user namespaces you'd see: chb@conventiont|~quoted
unshare -Uunshare: unshare failed: No space left on device So there's not even a need to upstream a new sysctl since we have ways of blocking this.I'm not sure how it's solving the problem that my patch-set is addressing? I agree though that the need for unprivileged_userns_clone sysctl goes away as this is equivalent to setting that sysctl to 0 as you have described above.
oh right that was the reasoning iirc for not needing the other sysctl.
However as I mentioned earlier, blocking processes from creating user-namespaces is not the solution. Processes should be able to create namespaces as they are designed but at the same time we need to have controls to 'contain' them if a need arise. Setting max_no to 0 is not the solution that I'm looking for since it doesn't solve the problem.
well yesterday we were told that was explicitly not the goal, but that was not by you ... i just mention it to explain why we seem to be walking in circles a bit. anyway the bounding set doesn't actually make sense so forget that. the question then is just whether it makes sense to allow things to continue at all in this situation. would you mind indulging me by giving one or two concrete examples in the previous known cves of what capabilities you would have dropped tto allow the rest to continue to be safely used? thanks, serge