Re: [PATCH v1 0/2] Add LSM access controls for io_uring_setup

From: Dmytro Maluka <hidden>
Date: 2023-08-09 17:28:36
Also in: io-uring, lkml, selinux

On 8/9/23 16:49, Paul Moore wrote:

On Wed, Aug 9, 2023 at 7:22 AM Dmytro Maluka [off-list ref] wrote:

quoted

On 8/9/23 02:31, Paul Moore wrote:

quoted

On Tue, Aug 8, 2023 at 4:40 PM Dmytro Maluka [off-list ref] wrote:

quoted

On 11/10/22 22:04, Paul Moore wrote:

quoted

On Thu, Nov 10, 2022 at 12:54 PM Jeffrey Vander Stoep [off-list ref] wrote:

quoted

On Mon, Nov 7, 2022 at 10:17 PM Paul Moore [off-list ref] wrote:

quoted

On Mon, Nov 7, 2022 at 3:58 PM Gil Cukierman [off-list ref] wrote:

quoted

This patchset provides the changes required for controlling access to
the io_uring_setup system call by LSMs. It does this by adding a new
hook to io_uring. It also provides the SELinux implementation for a new
permission, io_uring { setup }, using the new hook.

This is important because existing io_uring hooks only support limiting
the sharing of credentials and access to the sensitive uring_cmd file
op. Users of LSMs may also want the ability to tightly control which
callers can retrieve an io_uring capable fd from the kernel, which is
needed for all subsequent io_uring operations.

It isn't immediately obvious to me why simply obtaining a io_uring fd
from io_uring_setup() would present a problem, as the security
relevant operations that are possible with that io_uring fd *should*
still be controlled by other LSM hooks.  Can you help me understand
what security issue you are trying to resolve with this control?

I think there are a few reasons why we want this particular hook.

1.  It aligns well with how other resources are managed by selinux
where access to the resource is the first control point (e.g. "create"
for files, sockets, or bpf_maps, "prog_load" for bpf programs, and
"open" for perf_event) and then additional functionality or
capabilities require additional permissions.

[NOTE: there were two reply sections in your email, and while similar,
they were not identical; I've trimmed the other for the sake of
clarity]

The resources you mention are all objects which contain some type of
information (either user data, configuration, or program
instructions), with the resulting fd being a handle to those objects.
In the case of io_uring the fd is a handle to the io_uring
interface/rings, which by itself does not contain any information
which is not already controlled by other permissions.

I/O operations which transfer data between the io_uring buffers and
other system objects, e.g. IORING_OP_READV, are still subject to the
same file access controls as those done by the application using
syscalls.  Even the IORING_OP_OPENAT command goes through the standard
VFS code path which means it will trigger the same access control
checks as an open*() done by the application normally.

The 'interesting' scenarios are those where the io_uring operation
servicing credentials, aka personalities, differ from the task
controlling the io_uring.  However in those cases we have the new
io_uring controls to gate these delegated operations.  Passing an
io_uring fd is subject to the fd/use permission like any other fd.

Although perhaps the most relevant to your request is the fact that
the io_uring inode is created using the new(ish) secure anon inode
interface which ensures that the creating task has permission to
create an io_uring.  This io_uring inode label also comes into play
when a task attempts to mmap() the io_uring rings, a critical part of
the io_uring API.

If I'm missing something you believe to be important, please share the details.

quoted

2. It aligns well with how resources are managed on Android. We often
do not grant direct access to resources (like memory buffers).

Accessing the io_uring buffers requires a task to mmap() the io_uring
fd which is controlled by the normal SELinux mmap() access controls.

quoted

3. Attack surface management. One of the primary uses of selinux on
Android is to assess and limit attack surface (e.g.
https://twitter.com/jeffvanderstoep/status/1422771606309335043) . As
io_uring vulnerabilities have made their way through our vulnerability
management system, it's become apparent that it's complicated to
assess the impact. Is a use-after-free reachable? Creating
proof-of-concept exploits takes a lot of time, and often functionality
can be reached by multiple paths. How many of the known io_uring
vulnerabilities would be gated by the existing checks? How many future
ones will be gated by the existing checks? I don't know the answer to
either of these questions and it's not obvious. This hook makes that
initial assessment simple and effective.

It should be possible to deny access to io_uring via the anonymous
inode labels, the mmap() controls, and the fd/use permission.  If you
find a way to do meaningful work with an io_uring fd that can't be
controlled via an existing permission check please let me know.

Thank you a lot for this explanation. However, IMHO we should not
confuse 2 somewhat different problems here:

- protecting io_uring related resources (file descriptors, memory
  buffers) against unauthorized access

- protecting the entire system against potential vulnerabilities in
  io_uring

And while I agree that the existing permission checks should be already
sufficient for the former, I'm not quite sure they are sufficient for
the latter.

...

quoted

I already have a PoC patch [3] adding such LSM hook. But before I try to
submit it for upstream, I'd like to know your opinion on the whole idea.

First please explain how the existing LSM/SELinux control points are
not sufficient for restricting io_uring operations.  I'm looking for a
real program flow that is able to "do meaningful work with an io_uring
fd that can't be controlled via an existing permission check".

As I said at the beginning of my reply, I agree with you that the
existing LSM controls are sufficient for restricting io_uring I/O
operations. That is not my concern here. The concern is: how to (and
do we need to) restrict triggering execution of *any* io_uring code in
kernel, *in addition to* restricting the actual io_uring operations.

If your concern is preventing *any* io_uring code from being executed,
I would suggest simply not enabling io_uring at build time.  If you
need to selectively enable io_uring for some subset of processes, you
will need to make use of one of the options you discussed previously,
e.g. a LSM, seccomp, etc.

From a LSM perspective, I don't believe we want to be in the business
of blocking entire kernel subsystems from execution, rather we want to
provide control points so that admins and users can have better, or
more granular control over the security relevant operations that take
place within the different kernel subsystems.

quoted

In other words, "a real program doing a meaningful work with io_uring"
in this case would mean "an exploit for a real vulnerability in io_uring
code (in the current or any older kernel) which does not require an
access to io_uring operations to be exploited". I don't claim that such
vulnerabilities exist or are likely to be introduced in future kernels.
But I'm neither an io_uring expert nor, more importantly, a security
expert, so I cannot tell with confidence that they are not and we have
nothing to worry about here. So I'm interested in your and others'
opinion on that.

Once again, if you have serious concerns about the security or safety
of an individual kernel subsystem, your best option is to simply build
a kernel without that subsystem enabled.

Thanks for the answer. Yeah, disabling a problematic kernel subsystem at
build time is surely the safest option (and that is what we are already
doing in ChromeOS for io_uring, for that matter), and if we still want
to enable it for a limited subset of processes, it seems the cleanest
option is to use seccomp, rather than to add new ad-hoc LSM hooks for
blocking a specific subsystem.

One of the angles I'm coming from is actually the following:

- Android currently enables io_uring but limits its use to a few
  processes. But the way Android does that is by relying on the existing
  SELinux access controls for io_uring resources [1][2], rather than by
  preventing execution of any io_uring code via seccomp or other means.

  I guess the reason why Android doesn't use seccomp for that is the
  downsides of seccomp which I mentioned previously: in short, seccomp
  is well-suited for selectively denying syscalls for specific
  processes, but not so well-suited for selectively allowing them.

  So one of the questions I'm wondering about is: if Android implemented
  preventing execution of any io_uring code by non-trusted processes
  (via seccomp or any other way), how much would it help to reduce the
  risk of attacks, compared to its current SELinux based solution?

- ChromeOS currently completely disables io_uring in kernel, but we do
  want to allow it for a limited set of processes similarly to Android,
  and we are exploring ways to do it securely. Thus the above
  considerations for Android apply to ChromeOS as well.

[1] https://android-review.git.corp.google.com/c/platform/system/sepolicy/+/2302679
[2] https://android-review.git.corp.google.com/c/platform/system/sepolicy/+/2302679/6/public/te_macros

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help