Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup
From: Jann Horn <jannh@google.com>
Date: 2026-06-09 17:54:09
Also in:
linux-arch, linux-doc, linux-fsdevel, linux-kselftest, linux-mm, lkml
On Tue, Jun 9, 2026 at 8:08 AM Florian Weimer [off-list ref] wrote:
* Jann Horn:quoted
quoted
Per the above, the primary win would stem from *NOT* messing with mm.As you write below, I think we have that with CLONE_MM? The C function vfork() is kind of a terrible API because of its returns-twice behavior, but I think if process cloning with CLONE_VM|CLONE_VFORK was wrapped by libc in a way similar to clone() (with the child executing a separate handler function), or if it was used in the implementation of some higher-level process-spawning API, it would be a perfectly fine API?No, there is still a problem with SIGTSTP handling because we cannot atomically unmask the signal during execve. We need to unblock SIGTSTP before execve in the new process, but this means that it can get suspended by SIGTSTP. Consequently, the execve never happens and the original process is stuck in vfork: posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough <https://inbox.sourceware.org/libc-help/2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com/> More on the low-level side, it's difficult to make sure that execve gets a consistent snapshot of the environ vector. Both vfork and execve need to be async-signal-safe. Any locking or memory allocation (except for the stack …) persists in the original process after vfork returns. The
I think that's not entirely accurate; if you call set_robust_list() on a futex list, then call execve(), the futexes should be released once the process switches to a new MM, in begin_new_exec -> exec_mmap -> exec_mm_release -> futex_exec_release -> futex_cleanup -> exit_robust_list. So in theory you could use clone() with CLONE_VM and without CLONE_VFORK, and let the parent either wait for a futex that is released on exec, or somehow asynchronously check later whether the futex is still held... probably not the nicest building block but maybe workable? Though I guess it would fit more nicely if there was a "munmap() this range on exec" API...
environ vector can be large, so making a copy on the stack is not ideal. It's even harder for getenv/setenv/unsetenv implementations that use locking instead of software transactional memory.
Makes sense, that kind of sounds like a pain inherent in being able to execute from signal handler context...