Re: For review: documentation of clone3() system call
From: Christian Brauner <hidden>
Date: 2019-10-29 11:27:12
Also in:
linux-man, lkml
On Mon, Oct 28, 2019 at 08:09:13PM +0100, Jann Horn wrote:
On Mon, Oct 28, 2019 at 6:21 PM Christian Brauner [off-list ref] wrote:quoted
On Mon, Oct 28, 2019 at 04:12:09PM +0100, Jann Horn wrote:quoted
On Fri, Oct 25, 2019 at 6:59 PM Michael Kerrisk (man-pages) [off-list ref] wrote:quoted
I've made a first shot at adding documentation for clone3(). You can see the diff here: https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=faa0e55ae9e490d71c826546bbdef954a1800969[...]quoted
quoted
You might want to note somewhere that its flags can't be seccomp-filtered because they're stored in memory, making it inappropriate to use in heavily sandboxed processes.Hm, I don't think that belongs on the clone manpage. Granted that process creation is an important syscall but so are a bunch of others that aren't filterable because of pointer arguments. We can probably mention on the seccomp manpage that seccomp can't filter on pointer arguments and then provide a list of examples. If you setup a seccomp filter and don't know that you can't filter syscalls with pointer args that seems pretty bad to begin with.Fair enough. [...]quoted
One thing I never liked about clone() was that userspace had to know about stack direction. And there is a lot of ugly code in userspace that has nasty clone() wrappers like:[...]quoted
where stack + stack_size is addition on a void pointer which usually clang and gcc are not very happy about. I wanted to bring this up on the mailing list soon: If possible, I don't want userspace to need to know about stack direction and just have stack point to the beginning and then have the kernel do the + stack_size after the copy_clone_args_from_user() if the arch needs it. For example, by having a dumb helder similar to copy_thread_tls()/coyp_thread() that either does the + stack_size or not. Right now, clone3() is supported on parisc and afaict, the stack grows upwards for it. I'm not sure if there are obvious reasons why that won't work or it would be a bad idea...That would mean adding a new clone flag that redefines how those parameters work and describing the current behavior in the manpage as the behavior without the flag (which doesn't exist on 5.3), right?
I would break API and if someone reports breakage we'll revert and go
the more complicated route you outlined (see [1]).
But I don't think it will a big deal. First, we haven't documented how
stack needs to be passed so who knows what people currently do. Second,
clone3() has not been out for a long time and currently does _not_
provide features that legacy clone() does not provide apart from a
cleaner interface. So userspace has no incentive to use clone3() over
clone() right now. That'll change latest with v5.5 where we have new
features on top of clone3() (CLONE_CLEAR_SIGHAND). So let's just try and
fix it.
[1]: This is basically what Linus has repeatedly said: it's not about
never breaking api in principle but rather about whether this
breaks someones usecase. And if it does break, we need to revert.
Christian