Thread (17 messages) 17 messages, 7 authors, 2014-10-25

Re: [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call

From: Christoph Hellwig <hch@infradead.org>
Date: 2014-10-22 11:54:18
Also in: linux-arch, lkml

[adding Rich Felker to the Cc list, who has been very interested in a
O_SEARCH implementation for which this would be an important building
block]

On Fri, Oct 17, 2014 at 02:45:03PM -0700, Andy Lutomirski wrote:
[Added Eric Biederman, since I think your tree might be a reasonable
route forward for these patches.]

On Thu, Jun 5, 2014 at 6:40 AM, David Drysdale [off-list ref] wrote:
quoted
Resending, adding cc:linux-api.

Also, it may help to add a little more background -- this patch is
needed as a (small) part of implementing Capsicum in the Linux kernel.

Capsicum is a security framework that has been present in FreeBSD since
version 9.0 (Jan 2012), and is based on concepts from object-capability
security [1].

One of the features of Capsicum is capability mode, which locks down
access to global namespaces such as the filesystem hierarchy.  In
capability mode, /proc is thus inaccessible and so fexecve(3) doesn't
work -- hence the need for a kernel-space
I just found myself wanting this syscall for another reason: injecting
programs into sandboxes or otherwise heavily locked-down namespaces.

For example, I want to be able to reliably do something like nsenter
--namespace-flags-here toybox sh.  Toybox's shell is unusual in that
it is more or less fully functional, so this should Just Work (tm),
except that the toybox binary might not exist in the namespace being
entered.  If execveat were available, I could rig nsenter or a similar
tool to open it with O_CLOEXEC, enter the namespace, and then call
execveat.

Is there any reason that these patches can't be merged more or less as
is for 3.19?

--Andy
quoted
[1] http://www.cl.cam.ac.uk/research/security/capsicum/papers/2010usenix-security-capsicum-website.pdf

------

This patch set adds execveat(2) for x86, and is derived from Meredydd
Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).

The primary aim of adding an execveat syscall is to allow an
implementation of fexecve(3) that does not rely on the /proc
filesystem.  The current glibc version of fexecve(3) is implemented
via /proc, which causes problems in sandboxed or otherwise restricted
environments.

Given the desire for a /proc-free fexecve() implementation, HPA
suggested (https://lkml.org/lkml/2006/7/11/556) that an execveat(2)
syscall would be an appropriate generalization.

Also, having a new syscall means that it can take a flags argument
without back-compatibility concerns.  The current implementation just
defines the AT_SYMLINK_NOFOLLOW flag, but other flags could be added
in future -- for example, flags for new namespaces (as suggested at
https://lkml.org/lkml/2006/7/11/474).

Related history:
 - https://lkml.org/lkml/2006/12/27/123 is an example of someone
   realizing that fexecve() is likely to fail in a chroot environment.
 - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
   documenting the /proc requirement of fexecve(3) in its manpage, to
   "prevent other people from wasting their time".
 - https://bugzilla.kernel.org/show_bug.cgi?id=74481 documented that
   it's not possible to fexecve() a file descriptor for a script with
   close-on-exec set (which is possible with the implementation here).
 - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
   problem where a process that did setuid() could not fexecve()
   because it no longer had access to /proc/self/fd; this has since
   been fixed.


Changes since Meredydd's v3 patch:
 - Added a selftest.
 - Added a man page.
 - Left open_exec() signature untouched to reduce patch impact
   elsewhere (as suggested by Al Viro).
 - Filled in bprm->filename with d_path() into a buffer, to avoid use
   of potentially-ephemeral dentry->d_name.
 - Patch against v3.14 (455c6fdbd21916).


David Drysdale (2):
  syscalls,x86: implement execveat() system call
  syscalls,x86: add selftest for execveat(2)

 arch/x86/ia32/audit.c                   |   1 +
 arch/x86/ia32/ia32entry.S               |   1 +
 arch/x86/kernel/audit_64.c              |   1 +
 arch/x86/kernel/entry_64.S              |  28 ++++
 arch/x86/syscalls/syscall_32.tbl        |   1 +
 arch/x86/syscalls/syscall_64.tbl        |   2 +
 arch/x86/um/sys_call_table_64.c         |   1 +
 fs/exec.c                               | 153 ++++++++++++++++---
 include/linux/compat.h                  |   3 +
 include/linux/sched.h                   |   4 +
 include/linux/syscalls.h                |   4 +
 include/uapi/asm-generic/unistd.h       |   4 +-
 kernel/sys_ni.c                         |   3 +
 lib/audit.c                             |   3 +
 tools/testing/selftests/Makefile        |   1 +
 tools/testing/selftests/exec/.gitignore |   6 +
 tools/testing/selftests/exec/Makefile   |  32 ++++
 tools/testing/selftests/exec/execveat.c | 251 ++++++++++++++++++++++++++++++++
 18 files changed, 476 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/exec/.gitignore
 create mode 100644 tools/testing/selftests/exec/Makefile
 create mode 100644 tools/testing/selftests/exec/execveat.c

--
1.9.1.423.g4596e3a
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
---end quoted text---
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help