Inter-revision diff: patch 5

Comparing v4 (message) to v7 (message)

--- v4
+++ v7
@@ -1,5 +1,35 @@
-This sysctl enables to propagate executable permission to userspace
-thanks to the O_MAYEXEC flag.
+Allow for the enforcement of the O_MAYEXEC openat2(2) flag.  Thanks to
+the noexec option from the underlying VFS mount, or to the file execute
+permission, userspace can enforce these execution policies.  This may
+allow script interpreters to check execution permission before reading
+commands from a file, or dynamic linkers to allow shared object loading.
+
+Add a new sysctl fs.open_mayexec_enforce to enable system administrators
+to enforce two complementary security policies according to the
+installed system: enforce the noexec mount option, and enforce
+executable file permission.  Indeed, because of compatibility with
+installed systems, only system administrators are able to check that
+this new enforcement is in line with the system mount points and file
+permissions.  A following patch adds documentation.
+
+Being able to restrict execution also enables to protect the kernel by
+restricting arbitrary syscalls that an attacker could perform with a
+crafted binary or certain script languages.  It also improves multilevel
+isolation by reducing the ability of an attacker to use side channels
+with specific code.  These restrictions can natively be enforced for ELF
+binaries (with the noexec mount option) but require this kernel
+extension to properly handle scripts (e.g., Python, Perl).  To get a
+consistent execution policy, additional memory restrictions should also
+be enforced (e.g. thanks to SELinux).
+
+Because the O_MAYEXEC flag is a meant to enforce a system-wide security
+policy (but not application-centric policies), it does not make sense
+for userland to check the sysctl value.  Indeed, this new flag only
+enables to extend the system ability to enforce a policy thanks to (some
+trusted) userland collaboration.  Moreover, additional security policies
+could be managed by LSMs.  This is a best-effort approach from the
+application developer point of view:
+https://lore.kernel.org/lkml/1477d3d7-4b36-afad-7077-a38f42322238@digikod.net/
 
 Signed-off-by: Mickaël Salaün <mic@digikod.net>
 Reviewed-by: Thibaut Sautereau <thibaut.sautereau@ssi.gouv.fr>
@@ -7,24 +37,71 @@
 Cc: Al Viro <viro@zeniv.linux.org.uk>
 Cc: Jonathan Corbet <corbet@lwn.net>
 Cc: Kees Cook <keescook@chromium.org>
+Cc: Randy Dunlap <rdunlap@infradead.org>
 ---
 
+Changes since v6:
+* Allow opening pipes, block devices and character devices with
+  O_MAYEXEC when there is no enforced policy, but forbid any non-regular
+  file opened with O_MAYEXEC otherwise (i.e. for any enforced policy).
+* Add a paragraph about the non-regular files policy.
+* Move path_noexec() calls out of the fast-path (suggested by Kees
+  Cook).
+
+Changes since v5:
+* Remove the static enforcement configuration through Kconfig because it
+  makes the code more simple like this, and because the current sysctl
+  configuration can only be set with CAP_SYS_ADMIN, the same way mount
+  options (i.e. noexec) can be set.  If an harden distro wants to
+  enforce a configuration, it should restrict capabilities or sysctl
+  configuration.  Furthermore, an LSM can easily leverage O_MAYEXEC to
+  fit its need.
+* Move checks from inode_permission() to may_open() and make the error
+  codes more consistent according to file types (in line with a previous
+  commit): opening a directory with O_MAYEXEC returns EISDIR and other
+  non-regular file types may return EACCES.
+* In may_open(), when OMAYEXEC_ENFORCE_FILE is set, replace explicit
+  call to generic_permission() with an artificial MAY_EXEC to avoid
+  double calls.  This makes sense especially when an LSM policy forbids
+  execution of a file.
+* Replace the custom proc_omayexec() with
+  proc_dointvec_minmax_sysadmin(), and then replace the CAP_MAC_ADMIN
+  check with a CAP_SYS_ADMIN one (suggested by Kees Cook and Stephen
+  Smalley).
+* Use BIT() (suggested by Kees Cook).
+* Rename variables (suggested by Kees Cook).
+* Reword the kconfig help.
+* Import the documentation patch (suggested by Kees Cook):
+  https://lore.kernel.org/lkml/20200505153156.925111-6-mic@digikod.net/
+* Update documentation and add LWN.net article.
+
+Changes since v4:
+* Add kernel configuration options to enforce O_MAYEXEC at build time,
+  and disable the sysctl in such case (requested by James Morris).
+* Reword commit message.
+
 Changes since v3:
-* Switch back to O_MAYEXEC and highlight that it is only taken into
-  account by openat2(2).
+* Update comment with O_MAYEXEC.
 
 Changes since v2:
-* Update documentation with the new RESOLVE_MAYEXEC.
-* Improve explanations, including concerns about LD_PRELOAD.
+* Cosmetic changes.
 
 Changes since v1:
-* Move from LSM/Yama to sysctl/fs .
+* Move code from Yama to the FS subsystem (suggested by Kees Cook).
+* Make omayexec_inode_permission() static (suggested by Jann Horn).
+* Use mode 0600 for the sysctl.
+* Only match regular files (not directories nor other types), which
+  follows the same semantic as commit 73601ea5b7b1 ("fs/open.c: allow
+  opening only regular files during execve()").
 ---
- Documentation/admin-guide/sysctl/fs.rst | 44 +++++++++++++++++++++++++
- 1 file changed, 44 insertions(+)
+ Documentation/admin-guide/sysctl/fs.rst | 49 +++++++++++++++++++++++++
+ fs/namei.c                              | 24 ++++++++++++
+ include/linux/fs.h                      |  1 +
+ kernel/sysctl.c                         | 12 +++++-
+ 4 files changed, 84 insertions(+), 2 deletions(-)
 
 diff --git a/Documentation/admin-guide/sysctl/fs.rst b/Documentation/admin-guide/sysctl/fs.rst
-index 2a45119e3331..d55615c36772 100644
+index 2a45119e3331..ce6e2081d3a9 100644
 --- a/Documentation/admin-guide/sysctl/fs.rst
 +++ b/Documentation/admin-guide/sysctl/fs.rst
 @@ -37,6 +37,7 @@ Currently, these files are in /proc/sys/fs:
@@ -35,7 +112,7 @@
  - overflowuid
  - overflowgid
  - pipe-user-pages-hard
-@@ -165,6 +166,49 @@ system needs to prune the inode list instead of allocating
+@@ -165,6 +166,54 @@ system needs to prune the inode list instead of allocating
  more.
  
  
@@ -52,23 +129,23 @@
 +
 +The ability to restrict code execution must be thought as a system-wide policy,
 +which first starts by restricting mount points with the ``noexec`` option.
-+This option is also automatically applied to special filesystems such as /proc
-+.  This prevents files on such mount points to be directly executed by the
-+kernel or mapped as executable memory (e.g. libraries).  With script
-+interpreters using the ``O_MAYEXEC`` flag, the executable permission can then
-+be checked before reading commands from files. This makes it possible to
-+enforce the ``noexec`` at the interpreter level, and thus propagates this
-+security policy to scripts.  To be fully effective, these interpreters also
-+need to handle the other ways to execute code: command line parameters (e.g.,
-+option ``-e`` for Perl), module loading (e.g., option ``-m`` for Python),
-+stdin, file sourcing, environment variables, configuration files, etc.
-+According to the threat model, it may be acceptable to allow some script
-+interpreters (e.g. Bash) to interpret commands from stdin, may it be a TTY or a
-+pipe, because it may not be enough to (directly) perform syscalls.
++This option is also automatically applied to special filesystems such as /proc .
++This prevents files on such mount points to be directly executed by the kernel
++or mapped as executable memory (e.g. libraries).  With script interpreters
++using the ``O_MAYEXEC`` flag, the executable permission can then be checked
++before reading commands from files. This makes it possible to enforce the
++``noexec`` at the interpreter level, and thus propagates this security policy
++to scripts.  To be fully effective, these interpreters also need to handle the
++other ways to execute code: command line parameters (e.g., option ``-e`` for
++Perl), module loading (e.g., option ``-m`` for Python), stdin, file sourcing,
++environment variables, configuration files, etc.  According to the threat
++model, it may be acceptable to allow some script interpreters (e.g. Bash) to
++interpret commands from stdin, may it be a TTY or a pipe, because it may not be
++enough to (directly) perform syscalls.
 +
 +There are two complementary security policies: enforce the ``noexec`` mount
 +option, and enforce executable file permission.  These policies are handled by
-+the ``fs.open_mayexec_enforce`` sysctl (writable only with ``CAP_MAC_ADMIN``)
++the ``fs.open_mayexec_enforce`` sysctl (writable only with ``CAP_SYS_ADMIN``)
 +as a bitmask:
 +
 +1 - Mount restriction: checks that the mount options for the underlying VFS
@@ -77,14 +154,129 @@
 +2 - File permission restriction: checks that the to-be-opened file is marked as
 +    executable for the current process (e.g., POSIX permissions).
 +
++Note that as long as a policy is enforced, opening any non-regular file with
++``O_MAYEXEC`` is denied (e.g. TTYs, pipe), even when such a file is marked as
++executable or is on an executable mount point.
++
 +Code samples can be found in tools/testing/selftests/openat2/omayexec_test.c
-+and at
++and interpreter patches (for the original O_MAYEXEC version) may be found at
 +https://github.com/clipos-archive/clipos4_portage-overlay/search?q=O_MAYEXEC .
++See also an overview article: https://lwn.net/Articles/820000/ .
 +
 +
  overflowgid & overflowuid
  -------------------------
  
+diff --git a/fs/namei.c b/fs/namei.c
+index 3f074ec77390..8ec13c7fd403 100644
+--- a/fs/namei.c
++++ b/fs/namei.c
+@@ -39,6 +39,7 @@
+ #include <linux/bitops.h>
+ #include <linux/init_task.h>
+ #include <linux/uaccess.h>
++#include <linux/sysctl.h>
+ 
+ #include "internal.h"
+ #include "mount.h"
+@@ -425,6 +426,11 @@ static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
+ 	return 0;
+ }
+ 
++#define OPEN_MAYEXEC_ENFORCE_MOUNT	BIT(0)
++#define OPEN_MAYEXEC_ENFORCE_FILE	BIT(1)
++
++int sysctl_open_mayexec_enforce __read_mostly;
++
+ /**
+  * inode_permission - Check for access rights to a given inode
+  * @inode: Inode to check permission on
+@@ -2861,11 +2867,29 @@ static int may_open(const struct path *path, int acc_mode, int flag)
+ 	case S_IFSOCK:
+ 		if (acc_mode & MAY_EXEC)
+ 			return -EACCES;
++		/*
++		 * Opening devices (e.g. TTYs) or pipes with O_MAYEXEC may be
++		 * legitimate when there is no enforced policy.
++		 */
++		if ((acc_mode & MAY_OPENEXEC) && sysctl_open_mayexec_enforce)
++			return -EACCES;
+ 		flag &= ~O_TRUNC;
+ 		break;
+ 	case S_IFREG:
+ 		if ((acc_mode & MAY_EXEC) && path_noexec(path))
+ 			return -EACCES;
++		if (acc_mode & MAY_OPENEXEC) {
++			if ((sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_MOUNT)
++					&& path_noexec(path))
++				return -EACCES;
++			if (sysctl_open_mayexec_enforce & OPEN_MAYEXEC_ENFORCE_FILE)
++				/*
++				 * Because acc_mode may change here, the next and only
++				 * use of acc_mode should then be by the following call
++				 * to inode_permission().
++				 */
++				acc_mode |= MAY_EXEC;
++		}
+ 		break;
+ 	}
+ 
+diff --git a/include/linux/fs.h b/include/linux/fs.h
+index 56f835c9a87a..071f37707ccc 100644
+--- a/include/linux/fs.h
++++ b/include/linux/fs.h
+@@ -83,6 +83,7 @@ extern int sysctl_protected_symlinks;
+ extern int sysctl_protected_hardlinks;
+ extern int sysctl_protected_fifos;
+ extern int sysctl_protected_regular;
++extern int sysctl_open_mayexec_enforce;
+ 
+ typedef __kernel_rwf_t rwf_t;
+ 
+diff --git a/kernel/sysctl.c b/kernel/sysctl.c
+index db1ce7af2563..5008a2566e79 100644
+--- a/kernel/sysctl.c
++++ b/kernel/sysctl.c
+@@ -113,6 +113,7 @@ static int sixty = 60;
+ 
+ static int __maybe_unused neg_one = -1;
+ static int __maybe_unused two = 2;
++static int __maybe_unused three = 3;
+ static int __maybe_unused four = 4;
+ static unsigned long zero_ul;
+ static unsigned long one_ul = 1;
+@@ -888,7 +889,6 @@ static int proc_taint(struct ctl_table *table, int write,
+ 	return err;
+ }
+ 
+-#ifdef CONFIG_PRINTK
+ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
+ 				void *buffer, size_t *lenp, loff_t *ppos)
+ {
+@@ -897,7 +897,6 @@ static int proc_dointvec_minmax_sysadmin(struct ctl_table *table, int write,
+ 
+ 	return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+ }
+-#endif
+ 
+ /**
+  * struct do_proc_dointvec_minmax_conv_param - proc_dointvec_minmax() range checking structure
+@@ -3264,6 +3263,15 @@ static struct ctl_table fs_table[] = {
+ 		.extra1		= SYSCTL_ZERO,
+ 		.extra2		= &two,
+ 	},
++	{
++		.procname       = "open_mayexec_enforce",
++		.data           = &sysctl_open_mayexec_enforce,
++		.maxlen         = sizeof(int),
++		.mode           = 0600,
++		.proc_handler	= proc_dointvec_minmax_sysadmin,
++		.extra1		= SYSCTL_ZERO,
++		.extra2		= &three,
++	},
+ #if defined(CONFIG_BINFMT_MISC) || defined(CONFIG_BINFMT_MISC_MODULE)
+ 	{
+ 		.procname	= "binfmt_misc",
 -- 
-2.26.2
-
+2.27.0
+
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help