Re: [PATCH v21 1/6] exec: Add a new AT_EXECVE_CHECK flag to execveat(2)

From: Jeff Xu <hidden>
Date: 2024-11-27 15:15:04
Also in: linux-fsdevel, linux-integrity, linux-security-module, lkml

On Wed, Nov 27, 2024 at 4:07 AM Mickaël Salaün [off-list ref] wrote:

On Mon, Nov 25, 2024 at 09:38:51AM -0800, Jeff Xu wrote:

quoted

On Fri, Nov 22, 2024 at 6:50 AM Mickaël Salaün [off-list ref] wrote:

quoted

On Thu, Nov 21, 2024 at 10:27:40AM -0800, Jeff Xu wrote:

quoted

On Thu, Nov 21, 2024 at 5:40 AM Mickaël Salaün [off-list ref] wrote:

quoted

On Wed, Nov 20, 2024 at 08:06:07AM -0800, Jeff Xu wrote:

quoted

On Wed, Nov 20, 2024 at 1:42 AM Mickaël Salaün [off-list ref] wrote:

quoted

On Tue, Nov 19, 2024 at 05:17:00PM -0800, Jeff Xu wrote:

quoted

On Tue, Nov 12, 2024 at 11:22 AM Mickaël Salaün [off-list ref] wrote:

quoted

Add a new AT_EXECVE_CHECK flag to execveat(2) to check if a file would
be allowed for execution.  The main use case is for script interpreters
and dynamic linkers to check execution permission according to the
kernel's security policy. Another use case is to add context to access
logs e.g., which script (instead of interpreter) accessed a file.  As
any executable code, scripts could also use this check [1].

This is different from faccessat(2) + X_OK which only checks a subset of
access rights (i.e. inode permission and mount options for regular
files), but not the full context (e.g. all LSM access checks).  The main
use case for access(2) is for SUID processes to (partially) check access
on behalf of their caller.  The main use case for execveat(2) +
AT_EXECVE_CHECK is to check if a script execution would be allowed,
according to all the different restrictions in place.  Because the use
of AT_EXECVE_CHECK follows the exact kernel semantic as for a real
execution, user space gets the same error codes.

An interesting point of using execveat(2) instead of openat2(2) is that
it decouples the check from the enforcement.  Indeed, the security check
can be logged (e.g. with audit) without blocking an execution
environment not yet ready to enforce a strict security policy.

LSMs can control or log execution requests with
security_bprm_creds_for_exec().  However, to enforce a consistent and
complete access control (e.g. on binary's dependencies) LSMs should
restrict file executability, or mesure executed files, with
security_file_open() by checking file->f_flags & __FMODE_EXEC.

Because AT_EXECVE_CHECK is dedicated to user space interpreters, it
doesn't make sense for the kernel to parse the checked files, look for
interpreters known to the kernel (e.g. ELF, shebang), and return ENOEXEC
if the format is unknown.  Because of that, security_bprm_check() is
never called when AT_EXECVE_CHECK is used.

It should be noted that script interpreters cannot directly use
execveat(2) (without this new AT_EXECVE_CHECK flag) because this could
lead to unexpected behaviors e.g., `python script.sh` could lead to Bash
being executed to interpret the script.  Unlike the kernel, script
interpreters may just interpret the shebang as a simple comment, which
should not change for backward compatibility reasons.

Because scripts or libraries files might not currently have the
executable permission set, or because we might want specific users to be
allowed to run arbitrary scripts, the following patch provides a dynamic
configuration mechanism with the SECBIT_EXEC_RESTRICT_FILE and
SECBIT_EXEC_DENY_INTERACTIVE securebits.

This is a redesign of the CLIP OS 4's O_MAYEXEC:
https://github.com/clipos-archive/src_platform_clip-patches/blob/f5cb330d6b684752e403b4e41b39f7004d88e561/1901_open_mayexec.patch
This patch has been used for more than a decade with customized script
interpreters.  Some examples can be found here:
https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Kees Cook <redacted>
Cc: Paul Moore <paul@paul-moore.com>
Reviewed-by: Serge Hallyn <serge@hallyn.com>
Link: https://docs.python.org/3/library/io.html#io.open_code [1]
Signed-off-by: Mickaël Salaün <mic@digikod.net>
Link: https://lore.kernel.org/r/20241112191858.162021-2-mic@digikod.net (local)
---

Changes since v20:
* Rename AT_CHECK to AT_EXECVE_CHECK, requested by Amir Goldstein and
  Serge Hallyn.
* Move the UAPI documentation to a dedicated RST file.
* Add Reviewed-by: Serge Hallyn

Changes since v19:
* Remove mention of "role transition" as suggested by Andy.
* Highlight the difference between security_bprm_creds_for_exec() and
  the __FMODE_EXEC check for LSMs (in commit message and LSM's hooks) as
  discussed with Jeff.
* Improve documentation both in UAPI comments and kernel comments
  (requested by Kees).

New design since v18:
https://lore.kernel.org/r/20220104155024.48023-3-mic@digikod.net (local)
---
 Documentation/userspace-api/check_exec.rst | 34 ++++++++++++++++++++++
 Documentation/userspace-api/index.rst      |  1 +
 fs/exec.c                                  | 20 +++++++++++--
 include/linux/binfmts.h                    |  7 ++++-
 include/uapi/linux/fcntl.h                 |  4 +++
 kernel/audit.h                             |  1 +
 kernel/auditsc.c                           |  1 +
 security/security.c                        | 10 +++++++
 8 files changed, 75 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/userspace-api/check_exec.rst

diff --git a/Documentation/userspace-api/check_exec.rst b/Documentation/userspace-api/check_exec.rst
new file mode 100644
index 000000000000..ad1aeaa5f6c0
--- /dev/null
+++ b/Documentation/userspace-api/check_exec.rst

@@ -0,0 +1,34 @@
+===================
+Executability check
+===================
+
+AT_EXECVE_CHECK
+===============
+
+Passing the ``AT_EXECVE_CHECK`` flag to :manpage:`execveat(2)` only performs a
+check on a regular file and returns 0 if execution of this file would be
+allowed, ignoring the file format and then the related interpreter dependencies
+(e.g. ELF libraries, script's shebang).
+
+Programs should always perform this check to apply kernel-level checks against
+files that are not directly executed by the kernel but passed to a user space
+interpreter instead.  All files that contain executable code, from the point of
+view of the interpreter, should be checked.  However the result of this check
+should only be enforced according to ``SECBIT_EXEC_RESTRICT_FILE`` or
+``SECBIT_EXEC_DENY_INTERACTIVE.``.

Regarding "should only"
Userspace (e.g. libc) could decide to enforce even when
SECBIT_EXEC_RESTRICT_FILE=0), i.e. if it determines not-enforcing
doesn't make sense.

User space is always in control, but I don't think it would be wise to
not follow the configuration securebits (in a generic system) because
this could result to unattended behaviors (I don't have a specific one
in mind but...).  That being said, configuration and checks are
standalones and specific/tailored systems are free to do the checks they
want.

In the case of dynamic linker, we can always enforce honoring the
execveat(AT_EXECVE_CHECK) result, right ? I can't think of a case not
to,  the dynamic linker doesn't need to check the
SECBIT_EXEC_RESTRICT_FILE bit.

If the library file is not allowed to be executed by *all* access
control systems (not just mount and file permission, but all LSMs), then
the AT_EXECVE_CHECK will fail, which is OK as long as it is not a hard
requirement.

Yes. specifically for the library loading case, I can't think of a
case where we need to by-pass LSMs.  (letting user space to by-pass
LSM check seems questionable in concept, and should only be used when
there aren't other solutions). In the context of SELINUX enforcing
mode,  we will want to enforce it. In the context of process level LSM
such as landlock,  the process can already decide for itself by
selecting the policy for its own domain, it is unnecessary to use
another opt-out solution.

My answer wasn't clear.  The execveat(AT_EXECVE_CHECK) can and should
always be done, but user space should only enforce restrictions
according to the securebits.

I knew this part (AT_EXESCVE_CHECK is called always)
Since the securebits are enforced by userspace, setting it to 0 is
equivalent to opt-out enforcement, that is what I meant by opt-out.

OK, that was confusing because these bits are set to 0 by default (for
compatibility reasons).

quoted

It doesn't make sense to talk about user space "bypassing" kernel
checks.  This patch series provides a feature to enable user space to
enforce (at its level) the same checks as the kernel.

There is no opt-out solution, but compatibility configuration bits
through securebits (which can also be set by LSMs).

To answer your question about the dynamic linker, there should be no
difference of behavior with a script interpreter.  Both should check
executability but only enforce restriction according to the securebits
(as explained in the documentation).  Doing otherwise on a generic
distro could lead to unexpected behaviors (e.g. if a user enforced a
specific SELinux policy that doesn't allow execution of library files).

quoted

There is one case where I see a difference:
ld.so a.out (when a.out is on non-exec mount)

If the dynamic linker doesn't read SECBIT_EXEC_RESTRICT_FILE setting,
above will always fail. But that is more of a bugfix.

No, the dynamic linker should only enforce restrictions according to the
securebits, otherwise a user space update (e.g. with a new dynamic
linker ignoring the securebits) could break an existing system.

OK. upgrade is a valid concern. Previously, I was just thinking about
a new LSM based on this check, not existing LSM policies.
Do you happen to know which SELinux policy/LSM could break ? i.e. it
will be applied to libraries once we add AT_EXESCVE_CHECK in the
dynamic linker.

We cannot assume anything about LSM policies because of custom and
private ones.

That is a good point.

quoted

We could give heads up and prepare for that.

quoted

Relying on the securebits to know if this is a hard
requirement or not enables system administrator and distros to control
this potential behavior change.

I think, for the dynamic linker, it can be a hard requirement.

Not on a generic distro.

Ok. Maybe this can be done through a configuration option for the
dynamic linker.

Yes, we could have a built-time option (disabled by default) for the
dynamic linker to enforce that.

quoted

The consideration I have is: securebits is currently designed to
control both dynamic linker and shell scripts.
The case for dynamic linker is simpler than scripts cases, (non-exec
mount, and perhaps some LSM policies for libraries) and distributions
such as ChromeOS can enforce the dynamic linker case ahead of scripts
interrupter cases, i.e. without waiting for python/shell being
upgraded, that can take sometimes.

For secure systems, the end goal is to always enforce such restrictions,
so once interpretation/execution of a set of file types (e.g. ELF
libraries) are tested enough in such a system, we can remove the
securebits checks for the related library/executable (e.g. ld.so) and
consider that they are always set, independently of the current
user/credentials.

Great! I agree with that.

Thanks.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help