Re: [PATCH v2] Implement close-on-fork
From: Eric Dumazet <edumazet@google.com>
Date: 2020-05-15 15:31:07
Also in:
linux-alpha, linux-arch, linux-fsdevel, lkml, sparclinux
On Fri, May 15, 2020 at 8:23 AM Nate Karstens [off-list ref] wrote:
Series of 4 patches to implement close-on-fork. Tests have been published to https://github.com/nkarstens/ltp/tree/close-on-fork and cover close-on-fork functionality in the following syscalls: * accept(4) * dup3(2) * fcntl(2) * open(2) * socket(2) * socketpair(2) * unshare(2) Addresses underlying issue in that there is no way to prevent a fork() from duplicating a file descriptor. The existing close-on-exec flag partially-addresses this by allowing the parent process to mark a file descriptor as exclusive to itself, but there is still a period of time the failure can occur because the auto-close only occurs during the exec(). One manifestation of this is a race conditions in system(), which (depending on the implementation) is non-atomic in that it first calls a fork() and then an exec(). This functionality was approved by the Austin Common Standards Revision Group for inclusion in the next revision of the POSIX standard (see issue 1318 in the Austin Group Defect Tracker). --- This is v2 of the change. See https://lkml.org/lkml/2020/4/20/113 for the original work. Thanks to everyone who provided comments on the first series of patches. Here are replies to specific comments:quoted
I suggest we group the two bits of a file (close_on_exec, close_on_fork) together, so that we do not have to dirty two separate cache lines.I could be mistaken, but I don't think this would improve efficiency. The close-on-fork and close-on-exec flags are read at different times. If you assume separate syscalls for fork and exec then there are several switches between when the two flags are read. In addition, the close-on-fork flags in the new process must be cleared, which will be much harder if the flags are interleaved.
:/ Fast path in big and performance sensitive applications is not fork() and/or exec(). This is open()/close() and others (socket(), accept(), ...) We do not want them to access extra cache lines for this new feature. Sorry, I will say no to these patches in their current form.