Re: [PATCHv4 RESEND 0/3] syscalls,x86: Add execveat() system call
From: Andy Lutomirski <luto@amacapital.net>
Date: 2014-10-19 23:35:31
Also in:
linux-arch, lkml
On Sun, Oct 19, 2014 at 3:42 PM, Al Viro [off-list ref] wrote:
On Sun, Oct 19, 2014 at 03:16:03PM -0700, Andy Lutomirski wrote:quoted
Oh, you mean that #!/usr/bin/make -f would turn into /usr/bin/make /dev/fd/3? That could be interesting, although I can imagine it breaking things, especially if /dev/fd/3 isn't set up like that, e.g. early in boot.Sigh... What I mean is that fexecve(fd, ...) would have to put _something_ into argv when it execs the interpreter of #! file. Simply because the interpreter (which can be anything whatsoever) has no fscking idea what to do with some descriptor it has before execve(). Hell, it doesn't have any idea *which* descriptor had it been. You need to put some pathname that would yield your script upon open(2). If you bothered to read those patches, you'd see that they do supply one, generating it with d_path(). Which isn't particulary reliable. I'm not sure there's any point putting any of that in the kernel - if you *do* have that pathname, you can just pass it.
Hmm. This issue certainly makes fexecve or execveat less attractive, at least in cases where d_path won't work. On the other hand, if you want to run a static binary on a cloexec fd (or, for that matter, a dynamic binary if you trust the interpreter to close the extra copy of the fd it gets) in a namespace or chroot where the binary is invisible, then you need kernel help. It's too bad that script interpreters don't have a mechanism to open their scripts by fd.
quoted
Aside from the general scariness of allowing one process to actually dup another process's fds, I feel like this is asking for trouble wrt the various types of file locks.Who said anything about another process's fds? That, indeed, would be a recipe for serious trouble. It's a filesystem with one directory, not with one directory for each process...
This still has issues with locks if you pass an fd to a child process, but I guess that you get what you ask for if you do that.
FWIW, they (Plan 9) do have procfs and there they have /proc/<pid>/fd. Which is a regular file, with contents consisting of \n-terminated lines (one per descriptor in <pid>'s descriptor table>) in the same format as in *ctl (they put descriptor number as the first field in those). Unlike our solution, they do not allow to get to any process' files via procfs. They do allow /dev/stdin-style access to your own files via dupfs. And yes, for /dev/stdin and friends dup-style semantics is better - you get consistent behaviours for pipes and redirects from file that way. See the example I've posted upthread. Besides, for things like sockets our semantics simply fails - they really depend on having only one struct file for given socket; it's dup or nothing there. The same goes for a lot of things like eventfd, etc.
Fair enough. --Andy