Thread (36 messages) 36 messages, 9 authors, 6d ago

Re: [RFC PATCH v1 00/13] exec: add spawn templates for repeated executable startup

From: John Ericson <hidden>
Date: 2026-06-09 17:28:14
Also in: linux-arch, linux-doc, linux-fsdevel, linux-kselftest, linux-mm, lkml


On Tue, Jun 9, 2026, at 10:43 AM, Li Chen wrote:
Hi Andy,

---- On Tue, 09 Jun 2026 08:01:57 +0800  Andy Lutomirski [off-list ref] wrote ---
quoted
[...]

After contemplating this for a bit... why pidfd?  Doesn't a pidfd
refer to an actual process that is, or at least was, running?  This
new thing is a process that we are contemplating spawning.  I can
imagine that basically all pidfd APIs would be a bit confused by the
nonexistence of the process in question.
Yes, I think that is a real concern.

In my current local WIP I tried to keep that distinction explicit.
pidfd_spawn_open() returns a pidfs-backed builder fd, not a normal pidfd
referring to a process. The builder fd is allocated as an anonymous pidfs
file with builder-specific file operations:

    file = pidfs_alloc_anon_file("[pidfd_spawn]",
                                 &pidfd_spawn_builder_fops, builder,
                                 O_RDWR);
What does your builder fd point to, explicitly? For example in my other reply I
talked about how it was "real" process state. In my FreeBSD patch, for example,
I found there was already a status for a process "in exec", and I figured that
was clean to reuse for one of these "embryonic" processes that also hadn't
started running. I would reckon that Linux probably has some similar notions.
and the normal pidfd helpers still reject it because it does not use the
ordinary pidfd file operations:

    struct pid *pidfd_pid(const struct file *file)
    {
        if (file->f_op != &pidfs_file_operations)
            return ERR_PTR(-EBADF);
        return file_inode(file)->i_private;
    }

So the current split is:

    builder_fd = pidfd_spawn_open(...);       /* builder object */
    pidfd_config(builder_fd, ...);
    child_pidfd = pidfd_spawn_run(builder_fd, ...); /* real pidfd */

Only the last fd is a normal pidfd for an actual child process. The builder
fd is only accepted by the builder operations.

This avoids having to define what waitid(P_PIDFD), pidfd_send_signal(),
pidfd_getfd(), poll(), etc. mean before the process exists.
I wouldn't be so sure this is necessary/good. For example, I think it could
make sense to wait on a process that has yet to be started; one just waits for
both the process to start and the process to exit. Obviously a blocking syscall
in the thread that is spawning the process is not useful, but the asynchronous
poll variation seems fine.

As long as there is real process state here, it shouldn't be too hard to
implement.
The downside is that it adds a separate open-style entry point and is less
uniform than the pidfd_open(0, PIDFD_EMPTY) spelling Christian sketched.
I do think there is no point having two file descriptors. The file descriptor
that previously referred to the builder/embryonic process then can refer to the
real process, right?
If people think there is a better way to represent the pre-spawn builder
state, or if the preference is to integrate it directly into pidfd_open()
with an explicit empty/future-pidfd state, I would be happy to discuss that.
Hope the above answers your question? I suppose my ideas lean more on the
"future" than "empty" side --- there is indeed a thread in the thread group,
with real VM/namespace/file descriptor etc. state. Moreover, state gets
initialized before the process is started, so the actual start is a pretty
lightweight step of just letting the scheduler know the now-ready process can
be scheduled. The only thing that distinguishes the embryonic process from a
real one is simply that it isn't running --- i.e. isn't (yet) available to be
scheduled --- so the pidfds holders are free to poke at its state.

Cheers,

John
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help