Re: Commit 13c164b1a186 - regression for LSMs/SELinux? | linux-security-module

Re: Commit 13c164b1a186 - regression for LSMs/SELinux?

From: Ondrej Mosnacek <omosnace@redhat.com>
Date: 2020-09-25 13:37:56
Also in: autofs, selinux

On Thu, Sep 24, 2020 at 4:16 PM Stephen Smalley
[off-list ref] wrote:

On Thu, Sep 24, 2020 at 4:36 AM Ondrej Mosnacek [off-list ref] wrote:

quoted

On Wed, Sep 23, 2020 at 3:55 AM Ian Kent [off-list ref] wrote:

quoted

On Tue, 2020-09-22 at 09:33 +0200, Ondrej Mosnacek wrote:

quoted

On Mon, Sep 21, 2020 at 6:30 PM Al Viro [off-list ref]
wrote:

quoted

On Mon, Sep 21, 2020 at 06:09:22PM +0200, Christoph Hellwig wrote:

quoted

[adding Linus and Al]

On Mon, Sep 21, 2020 at 04:51:35PM +0200, Ondrej Mosnacek wrote:

quoted

Hi folks,

It seems that after commit 13c164b1a186 ("autofs: switch to
kernel_write") there is now an extra LSM permission required
(for the
current task to write to the automount pipe) for processes
accessing
some yet-to-to-be mounted directory on which an autofs mount is
set
up. The call chain is:
[...]
autofs_wait() ->
autofs_notify_daemon() ->
autofs_write() ->
kernel_write() ->
rw_verify_area() ->
security_file_permission()

The bug report that led me to this commit is at [1].

Technically, this is a regression for LSM users, since this is
a
kernel-internal operation and an LSM permission for the current
task
shouldn't be required. Can this patch be reverted? Perhaps
__kernel_{read|write}() could instead be renamed to
kernel_*_nocheck()
so that the name is more descriptive?

So we obviously should not break existing user space and need to
fix
this ASAP.  The trivial "fix" would be to export __kernel_write
again
and switch autofs to use it.  The other option would be a FMODE
flag
to bypass security checks, only to be set if the callers ensures
they've been valided (i.e. in autofs_prepare_pipe).

IMHO that sounds like an overkill in this scenario. I don't think it
makes sense to do the LSM check here (or at least not against the
current task's creds), because it is not the current task that wants
to communicate with the daemon, it just wants to to access some
directory on the system that just happens to be special to the
kernel,
which needs to do some communication on the side to service this
request. So if we do want to do any LSM check here, there should at
least be some "bool internal" flag passed to the LSM, signalizing
that
this is an internal read/write operation that wasn't directly
initiated/requested by the current process. SELinux could then either
use the kernel secid instead of the current task's secid or skip the
check completely in such case.

Perhaps, but see below.

quoted

I'd like Stephen to weigh in on this, but it looks he might be on
vacation right now...

quoted

Any opinions?

Reexport for now.  Incidentally, what is LSM doing rejecting writes
into a pipe?

With SELinux at least, what is allowed or denied is defined in the
policy. And the policy usually defaults to everything denied and then
you add rules to allow what needs (and makes sense) to be allowed.
Since until kernel 5.8 random processes didn't need to write to pipes
created by the automount daemon, it has never been explicitly allowed
and so the automounting now fails. It is in no way obvious that all
processes should have the permission to talk to the automount daemon
just to traverse the filesystem...

I think you might have misunderstood what lead to this, just a bit.

Previously the __kern_write() function was used for this communication
and Christoph's patch changed that to use kern_write() instead.

In theory that's a good idea because kern_write() adds some additional
sanity checks, one being a call to rw_verify_area() which is where the
security_file_permission() call fails.

So previously any random process could avoid these checks by calling
__kern_write() so the change to kern_write() is, in theory, that's a
good thing and simply reverting that hunk in Christoph's patch
probably isn't the best thing to do.

I understand that and I'm not proposing the revert as a long-term fix.
For a long-term solution I propose using kernel_write() and extending
it to allow the caller to suppress (just) the
security_file_permission() call. Then each caller would have to decide
whether the LSM check makes sense in that situation or not. It seems
safer against future mistakes than leaving it up to the caller to call
security_file_permission() explicitly (I predict that no new user
would even realize that the call might be needed).

quoted

But any random process does need to be able to write to the automount
daemon pipe for trailing path components and the root dentry of autofs
mounts, depending on case.

So it's true that any write to any autofs dentry probably doesn't
need to be allowed but I question what that gets us in terms of
security improvement over allowing pipe writes for automount_t
labelled pipes in selinux policy since they must be within an autofs
mounted file system.

The difference is not in security, but in usability. The kernel
communicating with the autofs daemon is an internal detail that
shouldn't need special rules in policy. Even if we want to do any LSM
checking here, the subject should be kernel_t, not the current running
process. The process doesn't have any control on whether the kernel
does the communication and it doesn't control the content of the
communication, so the permission check as it is doesn't make any
sense. People writing the policy should be burdened by low-level
details about how the kernel works internally as little as possible.

quoted

But Stephen has a different recommendation (and that appears to
consider the cause I outlined above) so I'll wait to see what others
think about the recommendations.

As I said above, I think Stephen's approach is less future-proof. And
it seems that rw_verify_area() has many other callers, most/all of
which probably service actual requests from userspace and we'd need to
retain the security_file_permission() call in those cases.

Let me try to put my proposal into a patch, so we have something
concrete to talk about...

Up-thread I thought Linus indicated he didn't really want a flag to
disable pemission checking due to potential abuse (and I agree).

IIUC he was against adding an FMODE flag, while I was rather
suggesting a new function parameter (I realize it probably wasn't
clear from what I wrote). The latter is less prone to accidents and
any new user would have to set that parameter to *something* and so
would be forced to think about whether he needs an LSM check or not.
Whereas with a separate _nosec()/nolsmcheck()/_whatever() function
they would likely just use the kernel_write() function they already
know and may not even realize that there is some decision to be made.

But that's just my opinion, I would still be happier with a separate
function than reverting to __kernel_write().

Historically we have taken one of two approaches for these situations:
1) Provide a separate interface like kernel_write() for use when we
don't want permission checking and don't have it call the security
hook at all.  If you prefer kernel_write_nosec() that's fine but I
think that's somewhat implicit in the fact that it is a
kernel-initiated write, not a userspace write (which would hopefully
go through vfs_write() or similar and end up calling the hook).

Unfortunately not all of the kernel_write() calls seem to be of the
kind that we want to skip the permission checking. One in particular
that caught my attention is this btrfs one:
https://elixir.bootlin.com/linux/v5.9-rc6/source/fs/btrfs/send.c#L561

Basically a userspace process passes some fd via ioctl and kernel then
writes something in it. If we don't do any check here (and BTW we in
fact want the current's creds to apply here), then the process could
pass any open fd and bypass any LSM write permissions to the file
behind that fd. Possibly there are some checks that restrict what can
be passed there (I didn't check), but it would still be fragile to
rely on them.

So I guess we can't just skip the check whenever the kernel_*()
function is called.

2) Temporarily override creds to the init_cred or the result of
prepare_kernel_creds() before calling any credential-checking
functions and then revert creds afterward.

Most users of kernel_write() seem to also create/open the file
themselves, usually using the current's creds, so this would break
consistency when applied universally. Autofs (and also
bpfilter_send_req()) are special in that they first get the fd from a
userspace process and then write into it from other tasks' syscalls.
Then there are one or two users that create the file using
shmem_kernel_file_setup(), which does use prepare_kernel_creds(), but
the resulting file is marked S_PRIVATE, so all LSM checks are already
skipped for it anyway.

The problem with #2 is that it still requires that the policy allow
kernel_t (or its equivalent) to be able to write to these pipes, which
wasn't previously necessary and thus might not be allowed in all
policies (e.g. Android?).  #1 avoids any need for policy for these
operations.

Yeah, I'm starting to think checking "kernel creds -> userspace
object" access vectors on purpose doesn't make much sense... you would
pretty much want to always allow that (no point in kernel blocking its
own operations) so just skipping the check in those cases where using
the current's creds doesn't make sense seems to be a better choice.

BTW, I don't have the candidate patch written yet, as I haven't had
time and also need to process new information after having reviewed
the existing callers.

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help