RE: [PATCH 1/4] fs: Implement close-on-fork
From: Karstens, Nate <hidden>
Date: 2020-05-01 14:45:40
Also in:
linux-arch, linux-fsdevel, lkml, netdev, sparclinux
Eric, Thanks for the suggestion. I looked into it and noticed that do_close_on_exec() appears to have some optimizations as well:
set = fdt->close_on_exec[i]; if (!set) continue;
If we interleave the close-on-exec and close-on-fork flags then this optimization will have to be removed. Do you have a sense of which optimization provides the most benefit? I noticed a couple of other issues with the original patch that I will need to investigate or rework: 1) I'm not sure dup_fd() is the best place to check the close-on-fork flag. For example, the ksys_unshare() > unshare_fd() > dup_fd() execution path seems suspect. I will either add a parameter to the function indicating if the flag should be checked or do a separate function, like do_close_on_fork(). 2) If the close-on-fork flag is set, then __clear_open_fd() should be called instead of just __clear_bit(). This will ensure that fdt->full_fds_bits() is updated. 3) Need to investigate if the close-on-fork (or close-on-exec) flags need to be cleared when the file is closed as part of the close-on-fork execution path. Others -- I will respond to feedback outside of implementation details in a separate message. Thanks, Nate -----Original Message----- From: Eric Dumazet <redacted> Sent: Monday, April 20, 2020 05:26 To: Karstens, Nate <redacted>; Alexander Viro <viro@zeniv.linux.org.uk>; Jeff Layton <jlayton@kernel.org>; J. Bruce Fields <redacted>; Arnd Bergmann <arnd@arndb.de>; Richard Henderson <redacted>; Ivan Kokshaysky <redacted>; Matt Turner <mattst88@gmail.com>; James E.J. Bottomley <James.Bottomley@HansenPartnership.com>; Helge Deller <deller@gmx.de>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; linux-fsdevel@vger.kernel.org; linux-arch@vger.kernel.org; linux-alpha@vger.kernel.org; linux-parisc@vger.kernel.org; sparclinux@vger.kernel.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org Cc: Changli Gao <redacted> Subject: Re: [PATCH 1/4] fs: Implement close-on-fork CAUTION - EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe. On 4/20/20 12:15 AM, Nate Karstens wrote:
The close-on-fork flag causes the file descriptor to be closed atomically in the child process before the child process returns from fork(). Implement this feature and provide a method to get/set the close-on-fork flag using fcntl(2). This functionality was approved by the Austin Common Standards Revision Group for inclusion in the next revision of the POSIX standard (see issue 1318 in the Austin Group Defect Tracker).
Oh well... yet another feature slowing down a critical path.
quoted hunk ↗ jump to hunk
Co-developed-by: Changli Gao <redacted> Signed-off-by: Changli Gao <redacted> Signed-off-by: Nate Karstens <redacted> --- fs/fcntl.c | 2 ++ fs/file.c | 50 +++++++++++++++++++++++++- include/linux/fdtable.h | 7 ++++ include/linux/file.h | 2 ++ include/uapi/asm-generic/fcntl.h | 5 +-- tools/include/uapi/asm-generic/fcntl.h | 5 +-- 6 files changed, 66 insertions(+), 5 deletions(-)diff --git a/fs/fcntl.c b/fs/fcntl.c index 2e4c0fa2074b..23964abf4a1a 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c@@ -335,10 +335,12 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg, break; case F_GETFD: err = get_close_on_exec(fd) ? FD_CLOEXEC : 0; + err |= get_close_on_fork(fd) ? FD_CLOFORK : 0; break; case F_SETFD: err = 0; set_close_on_exec(fd, arg & FD_CLOEXEC); + set_close_on_fork(fd, arg & FD_CLOFORK); break; case F_GETFL: err = filp->f_flags;diff --git a/fs/file.c b/fs/file.c index c8a4e4c86e55..de7260ba718d 100644 --- a/fs/file.c +++ b/fs/file.c@@ -57,6 +57,8 @@ static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt, memset((char *)nfdt->open_fds + cpy, 0, set); memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy); memset((char *)nfdt->close_on_exec + cpy, 0, set); + memcpy(nfdt->close_on_fork, ofdt->close_on_fork, cpy); + memset((char *)nfdt->close_on_fork + cpy, 0, set);
I suggest we group the two bits of a file (close_on_exec, close_on_fork) together, so that we do not have to dirty two separate cache lines.
Otherwise we will add yet another cache line miss at every file opening/closing for processes with big file tables.
Ie having a _single_ bitmap array, even bit for close_on_exec, odd bit for close_on_fork
static inline void __set_close_on_exec(unsigned int fd, struct fdtable *fdt) {
__set_bit(fd * 2, fdt->close_on_fork_exec); }
static inline void __set_close_on_fork(unsigned int fd, struct fdtable *fdt) {
__set_bit(fd * 2 + 1, fdt->close_on_fork_exec); }
Also the F_GETFD/F_SETFD implementation must use a single function call, to not acquire the spinlock twice.