Re: [PATCH 2/2] Add a new sysctl knob: unprivileged_userfaultfd_user_mode_only
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2020-07-24 13:40:20
Also in:
linux-fsdevel, lkml
On Thu, Jul 23, 2020 at 05:13:28PM -0700, Nick Kralevich wrote:
On Thu, Jul 23, 2020 at 10:30 AM Lokesh Gidra [off-list ref] wrote:quoted
From the discussion so far it seems that there is a consensus that patch 1/2 in this series should be upstreamed in any case. Is there anything that is pending on that patch?That's my reading of this thread too.quoted
quoted
quoted
Unless I'm mistaken that you can already enforce bit 1 of the second parameter of the userfaultfd syscall to be set with seccomp-bpf, this would be more a question to the Android userland team. The question would be: does it ever happen that a seccomp filter isn't already applied to unprivileged software running without SYS_CAP_PTRACE capability?Yes. Android uses selinux as our primary sandboxing mechanism. We do use seccomp on a few processes, but we have found that it has a surprisingly high performance cost [1] on arm64 devices so turning it on system wide is not a good option. [1] https://lore.kernel.org/linux-security-module/202006011116.3F7109A@keescook/T/#m82ace19539ac595682affabdf652c0ffa5d27dad (local)As Jeff mentioned, seccomp is used strategically on Android, but is not applied to all processes. It's too expensive and impractical when simpler implementations (such as this sysctl) can exist. It's also significantly simpler to test a sysctl value for correctness as opposed to a seccomp filter.
Given that selinux is already used system-wide on Android, what is wrong with using selinux to control userfaultfd as opposed to seccomp?
quoted
quoted
quoted
If answer is "no" the behavior of the new sysctl in patch 2/2 (in subject) should be enforceable with minor changes to the BPF assembly. Otherwise it'd require more changes.It would be good to understand what these changes are.quoted
quoted
quoted
Why exactly is it preferable to enlarge the surface of attack of the kernel and take the risk there is a real bug in userfaultfd code (not just a facilitation of exploiting some other kernel bug) that leads to a privilege escalation, when you still break 99% of userfaultfd users, if you set with option "2"?I can see your point if you think about the feature as a whole. However, distributions (such as Android) have specialized knowledge of their security environments, and may not want to support the typical usages of userfaultfd. For such distributions, providing a mechanism to prevent userfaultfd from being useful as an exploit primitive, while still allowing the very limited use of userfaultfd for userspace faults only, is desirable. Distributions shouldn't be forced into supporting 100% of the use cases envisioned by userfaultfd when their needs may be more specialized, and this sysctl knob empowers distributions to make this choice for themselves.quoted
quoted
quoted
Is the system owner really going to purely run on his systems CRIU postcopy live migration (which already runs with CAP_SYS_PTRACE) and nothing else that could break?This is a great example of a capability which a distribution may not want to support, due to distribution specific security policies.quoted
quoted
quoted
Option "2" to me looks with a single possible user, and incidentally this single user can already enforce model "2" by only tweaking its seccomp-bpf filters without applying 2/2. It'd be a bug if android apps runs unprotected by seccomp regardless of 2/2.Can you elaborate on what bug is present by processes being unprotected by seccomp? Seccomp cannot be universally applied on Android due to previously mentioned performance concerns. Seccomp is used in Android primarily as a tool to enforce the list of allowed syscalls, so that such syscalls can be audited before being included as part of the Android API. -- Nick -- Nick Kralevich | nnk@google.com