Re: [kernel-hardening] Re: [PATCH resend 2/2] userns: control capabilities of some user namespaces
From: Serge E. Hallyn <hidden>
Date: 2017-11-07 03:23:14
Also in:
lkml, netdev
On Mon, Nov 06, 2017 at 09:16:03PM -0500, Daniel Micay wrote:
On Mon, 2017-11-06 at 16:14 -0600, Serge E. Hallyn wrote:quoted
Quoting Daniel Micay (danielmicay-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org):quoted
Substantial added attack surface will never go away as a problem. There aren't a finite number of vulnerabilities to be found.There's varying levels of usefulness and quality. There is code which I want to be able to use in a container, and code which I can't ever see a reason for using there. The latter, especially if it's also in a staging driver, would be nice to have a toggle to disable. You're not advocating dropping the added attack surface, only adding a way of dealing with an 0day after the fact. Privilege raising 0days can exist anywhere, not just in code which only root in a user namespace can exercise. So from that point of view, ksplice seems a more complete solution. Why not just actually fix the bad code block when we know about it?That's not what I'm advocating. I only care about it for proactive attack surface reduction downstream. I have no interest in using it to block access to known vulnerabilities.quoted
Finally, it has been well argued that you can gain many new caps from having only a few others. Given that, how could you ever be sure that, if an 0day is found which allows root in a user ns to abuse CAP_NET_ADMIN against the host, just keeping CAP_NET_ADMIN from them would suffice?I didn't suggest using it that way...quoted
It seems to me that the existing control in /proc/sys/kernel/unprivileged_userns_clone might be the better duct tape in that case.There's no such thing as unprivileged_userns_clone in mainline.
Hm. I was sure Kees had gotten that in... I guess I was wrong.
The advantage of this over unprivileged_userns_clone in Debian and maybe some other distributions is not giving up unprivileged app containers / sandboxes implemented via user namespaces. For example, Chromium's user namespace sandbox likely only needs to have CAP_SYS_CHROOT. Chromium will be dropping their setuid sandbox, forcing usage of user namespaces to avoid losing the sandbox which will greatly increase local kernel attack surface on the host by exposing netfilter management, etc. to unprivileged users. The proposed approach isn't necessarily the best way to implement this kind of mitigation but I think it's filling a real need.
I think I definately prefer what I mentioned in the email to Boris. Basically a "permanent capability bounding set". The normal bounding set gets reset to a full set on every new user_ns creation. In this proposal, it would instead be set to the calling task's permanent capability set, which starts (at boot) full, and which privileged tasks can pull capabilities out of. -serge