Thread (22 messages) 22 messages, 5 authors, 2019-11-14

Re: For review: documentation of clone3() system call

From: Christian Brauner <hidden>
Date: 2019-10-29 11:27:12
Also in: linux-man, lkml

On Mon, Oct 28, 2019 at 08:09:13PM +0100, Jann Horn wrote:
On Mon, Oct 28, 2019 at 6:21 PM Christian Brauner
[off-list ref] wrote:
quoted
On Mon, Oct 28, 2019 at 04:12:09PM +0100, Jann Horn wrote:
quoted
On Fri, Oct 25, 2019 at 6:59 PM Michael Kerrisk (man-pages)
[off-list ref] wrote:
quoted
I've made a first shot at adding documentation for clone3(). You can
see the diff here:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=faa0e55ae9e490d71c826546bbdef954a1800969
[...]
quoted
quoted
You might want to note somewhere that its flags can't be
seccomp-filtered because they're stored in memory, making it
inappropriate to use in heavily sandboxed processes.
Hm, I don't think that belongs on the clone manpage. Granted that
process creation is an important syscall but so are a bunch of others
that aren't filterable because of pointer arguments.
We can probably mention on the seccomp manpage that seccomp can't filter
on pointer arguments and then provide a list of examples. If you setup a
seccomp filter and don't know that you can't filter syscalls with
pointer args that seems pretty bad to begin with.
Fair enough.

[...]
quoted
One thing I never liked about clone() was that userspace had to know
about stack direction. And there is a lot of ugly code in userspace that
has nasty clone() wrappers like:
[...]
quoted
where stack + stack_size is addition on a void pointer which usually
clang and gcc are not very happy about.
I wanted to bring this up on the mailing list soon: If possible, I don't
want userspace to need to know about stack direction and just have stack
point to the beginning and then have the kernel do the + stack_size
after the copy_clone_args_from_user() if the arch needs it. For example,
by having a dumb helder similar to copy_thread_tls()/coyp_thread() that
either does the + stack_size or not. Right now, clone3() is supported on
parisc and afaict, the stack grows upwards for it. I'm not sure if there
are obvious reasons why that won't work or it would be a bad idea...
That would mean adding a new clone flag that redefines how those
parameters work and describing the current behavior in the manpage as
the behavior without the flag (which doesn't exist on 5.3), right?
I would break API and if someone reports breakage we'll revert and go
the more complicated route you outlined (see [1]).
But I don't think it will a big deal. First, we haven't documented how
stack needs to be passed so who knows what people currently do. Second,
clone3() has not been out for a long time and currently does _not_
provide features that legacy clone() does not provide apart from a
cleaner interface. So userspace has no incentive to use clone3() over
clone() right now. That'll change latest with v5.5 where we have new
features on top of clone3() (CLONE_CLEAR_SIGHAND). So let's just try and
fix it.

[1]: This is basically what Linus has repeatedly said: it's not about
     never breaking api in principle but rather about whether this
     breaks someones usecase. And if it does break, we need to revert.

Christian
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help