Thread (45 messages) 45 messages, 4 authors, 2025-08-07

[PATCH 05/10] fsconfig.2: document 'new' mount api

From: Aleksa Sarai <hidden>
Date: 2025-08-05 16:26:40
Subsystem: the rest · Maintainer: Linus Torvalds

This is loosely based on the original documentation written by David
Howells and later maintained by Christian Brauner, but has been
rewritten to be more from a user perspective (as well as fixing a few
critical mistakes).

Co-developed-by: David Howells <dhowells@redhat.com>
Co-developed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Aleksa Sarai <redacted>
---
 man/man2/fsconfig.2 | 446 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 446 insertions(+)
diff --git a/man/man2/fsconfig.2 b/man/man2/fsconfig.2
new file mode 100644
index 000000000000..572e0fc5c1b9
--- /dev/null
+++ b/man/man2/fsconfig.2
@@ -0,0 +1,446 @@
+.\" Copyright, the authors of the Linux man-pages project
+.\"
+.\" SPDX-License-Identifier: Linux-man-pages-copyleft
+.\"
+.TH fsconfig 2 (date) "Linux man-pages (unreleased)"
+.SH NAME
+fsconfig \- configure new or existing filesystem contexts
+.SH LIBRARY
+Standard C library
+.RI ( libc ,\~ \-lc )
+.SH SYNOPSIS
+.nf
+.BR "#include <sys/mount.h>"
+.BR "#include <unistd.h>"
+.P
+.BI "int fsconfig(int " fd ", unsigned int " cmd ",
+.BI "             const char *" key ", const void *" value ", int " aux ");
+.fi
+.SH DESCRIPTION
+The
+.BR fsconfig ()
+system call is part of the suite of file descriptor based mount facilities in
+Linux.
+.P
+.BR fsconfig ()
+is used to supply parameters to and issue commands against the filesystem
+configuration context associated with the file descriptor
+.IR fd .
+Filesystem configuration contexts can be created with
+.BR fsopen (2)
+or instantiated from an existing mount with
+.BR fspick (2).
+.P
+The
+.I cmd
+argument indicates the command to be issued.
+Some commands simply supply parameters to the context (equivalent to mount
+options specified with
+.BR mount (2)),
+while others are meta-operations on the filesystem context.
+The list of valid
+.I cmd
+values are:
+.RS
+.TP
+.B FSCONFIG_SET_FLAG
+Set the flag parameter named by
+.IR key .
+.I value
+must be NULL, and
+.I aux
+must be 0.
+.TP
+.B FSCONFIG_SET_STRING
+Set the string parameter named by
+.I key
+to the value specified by
+.IR value .
+.I value
+points to a NUL-terminated string, and
+.I aux
+must be 0.
+.TP
+.B FSCONFIG_SET_BINARY
+Set the parameter named by
+.I key
+to the binary blob specified by
+.IR value .
+.I value
+points to the start of a buffer that is
+.I aux
+bytes in length.
+.TP
+.B FSCONFIG_SET_FD
+Set the file parameter named by
+.I key
+to the file referenced by the file descriptor
+.IR aux .
+.I value
+must be NULL.
+.IP
+You may also use
+.B FSCONFIG_SET_STRING
+for file parameters, with
+.I value
+set to a NUL-terminated string containing a base-10 representation of the file
+descriptor number.
+This mechanism is primarily intended for compatibility with older
+.BR mount (2)-based
+programs.
+.TP
+.B FSCONFIG_SET_PATH
+Set the path parameter named by
+.I key
+to the object at a provided path, resolved in a similar manner to
+.BR openat (2).
+.I value
+points to a NUL-terminated pathname string, and
+.I aux
+is equivalent to the
+.I dirfd
+argument to
+.BR openat (2).
+.IP
+You may also use
+.B FSCONFIG_SET_STRING
+for path parameters, the behaviour of which is equivalent to
+.B FSCONFIG_SET_PATH
+with
+.I aux
+set to
+.BR AT_FDCWD .
+.TP
+.B FSCONFIG_SET_PATH_EMPTY
+As with
+.BR FSCONFIG_SET_PATH ,
+except that if
+.I value
+is an empty string, then the file descriptor specified by
+.I aux
+may be any type of file (not just a directory) and will be used as the path
+parameter value, equivalent to the behaviour of
+.B AT_EMPTY_PATH
+with other "*at()" system calls.
+.IP
+Note that this behaviour with empty paths is distinct in some subtle ways to
+.BR FSCONFIG_SET_FD .
+.B FSCONFIG_SET_FD
+indicates that the underlying file for the file descriptor
+.I aux
+should be used as the parameter value directly;
+.B FSCONFIG_SET_PATH_EMPTY
+indicates that the underlying file for the file descriptor
+.I aux
+should be re-opened by the filesystem driver, and the newly created file
+description should be used as the parameter value.
+This can result in slightly different behaviour when dealing with special files
+or files sourced from pseudofilesystems.
+Filesystems may also choose to only support one kind of parameter, and so a
+parameter that accepts
+.B FSCONFIG_SET_FD
+may not work with
+.B FSCONFIG_SET_PATH
+(or vice-versa).
+.TP
+.B FSCONFIG_CMD_CREATE
+This command instructs the filesystem driver to instantiate an instance of the
+filesystem in the kernel with the parameters set in the filesystem
+configuration context referenced by the file descriptor
+.IR fd .
+.IR key " and " value
+must be NULL, and
+.I aux
+must be 0.
+.IP
+If this operation succeeds, the file descriptor
+.I fd
+now references the created filesystem instance, and is placed into a special
+"needs-mount" mode that allows you to use
+.BR fsmount (2)
+to create a mount object from the filesystem instance.
+.IP
+This is intended for use with filesystem configuration contexts created with
+.BR fsopen (2).
+In order to create a filesystem instance, the calling process must have the
+.B CAP_SYS_ADMIN
+capability.
+.IP
+Note that the Linux kernel reuses filesystem instances for many filesystems, so
+(depending on the filesystem being configured and parameters used) it is
+possible the the filesystem instance "created" by
+.B FSCONFIG_CMD_CREATE
+may in fact be a reference to an existing filesystem instance in the kernel.
+The kernel will attempt to merge the specified parameters of this filesystem
+configuration context with those of the filesystem instance being reused, but
+some parameters may be silently ignored.
+.IP
+Programs that need to ensure that they create a new filesystem instance with
+specific parameters (notably, security-related parameters such POSIX ACLs -- as
+described in
+.BR acl (5))
+should use
+.B FSCONFIG_CMD_CREATE_EXCL
+instead.
+.TP
+.BR FSCONFIG_CMD_CREATE_EXCL " (since Linux 6.6)"
+.\" commit 22ed7ecdaefe0cac0c6e6295e83048af60435b13
+As with
+.BR FSCONFIG_CMD_CREATE ,
+except that the kernel is instructed to create a new filesystem instance
+("superblock" in kernel parlance) rather than reusing an existing one.
+.IP
+If this is not possible (such as with disk-backed filesystems where multiple
+filesystem instances using the same filesystem driver and writing to the same
+underlying device could result in data corruption), this operation will incur
+an
+.B EBUSY
+error.
+.IP
+As a result (unlike
+.BR FSCONFIG_CMD_CREATE ),
+if this command succeeds then the calling process can be sure that all of the
+parameters successfully configured with
+.BR fsconfig ()
+will actually apply to the created filesystem instance.
+.TP
+.B FSCONFIG_CMD_RECONFIGURE
+This command instructs the filesystem driver to apply the parameters set in
+this filesystem configuration context to an already existing filesystem
+instance.
+.IP
+This is primarily intended for use with
+.BR fspick (2),
+but may also be used to modify the parameters of filesystem instance after
+.BR FSCONFIG_CMD_CREATE
+was used to create it.
+In order to reconfigure an existing filesystem instance, the calling process
+must have the
+.B CAP_SYS_ADMIN
+capability.
+.IP
+Once this operation succeeds, the filesystem context is reset but remains in
+reconfiguration mode and thus can be used for subsequent
+.B FSCONFIG_CMD_RECONFIGURE
+commands.
+.SH RETURN VALUE
+On success,
+.BR fsconfig ()
+returns 0.
+On error, \-1 is returned, and
+.I errno
+is set to indicate the error.
+.SH ERRORS
+If an error occurs, the filesystem driver may provide additional information
+about the error through the message retrieval interface for filesystem
+configuration contexts.
+This additional information can be retrieved at any time by calling
+.BR read (2)
+on the filesystem instance or filesystem configuration context referenced by
+the file descriptor
+.IR fd .
+(See the "Message Retrieval Interface" subsection in
+.BR fsopen (2)
+for more details on the message format.)
+.P
+Even after an error occurs, the filesystem configuration context is
+.I not
+invalidated, and thus can still be used with other
+.BR fsconfig ()
+commands.
+This means that users can probe support for mount parameters on a per-parameter
+basis, and adjust which parameters they wish to set.
+.P
+The error values given below result from filesystem type independent errors.
+Each filesystem type may have its own special errors and its own special
+behavior.
+See the Linux kernel source code for details.
+.P
+.TP
+.B EACCES
+A component of a path provided as a path parameter was not searchable.
+(See also
+.BR path_resolution (7).)
+.TP
+.B EACCES
+Mounting a read-only filesystem was attempted without specifying the
+.RB ' ro '
+flag parameter.
+.TP
+.B EACCES
+A specified block device parameter is located on a filesystem mounted with the
+.B MS_NODEV
+option.
+.TP
+.B EBADF
+The file descriptor given by
+.I fd
+(or possibly by
+.IR aux ,
+depending on the command) is invalid.
+.TP
+.B EBUSY
+The filesystem context attached to
+.I fd
+is in the wrong state for the given command.
+.TP
+.B EBUSY
+The filesystem instance cannot be reconfigured as read-only with
+.B FSCONFIG_CMD_RECONFIGURE
+because it still holds files open for writing.
+.TP
+.B EBUSY
+A new filesystem instance was requested with
+.B FSCONFIG_CMD_CREATE_EXCL
+but a matching superblock already existed.
+.TP
+.B EFAULT
+One of the pointer arguments points to a location outside the calling process's
+accessible address space.
+.TP
+.B EINVAL
+.I fd
+does not refer to a filesystem configuration context or filesystem instance.
+.TP
+.B EINVAL
+One of the values of
+.IR name ", " value ", and/or " aux
+were set to non-zero value when
+.I cmd
+required that they be zero (or NULL).
+.TP
+.B EINVAL
+The parameter named by
+.I name
+cannot be set using the type specified with
+.IR cmd .
+.TP
+.B EINVAL
+One of the source parameters referred to an invalid superblock.
+.TP
+.B ELOOP
+Too many links encountered during pathname resolution of a path argument.
+.TP
+.B ENAMETOOLONG
+A path argument was longer than
+.BR PATH_MAX .
+.TP
+.B ENOENT
+A path argument had a non-existent component.
+.TP
+.B ENOENT
+A path argument is an empty string, but
+.I cmd
+is not
+.BR FSCONFIG_SET_PATH_EMPTY .
+.TP
+.B ENOMEM
+The kernel could not allocate sufficient memory to complete the operation.
+.TP
+.B ENOTBLK
+The parameter named by
+.I name
+must be a block device, but the provided parameter value was not a block
+device.
+.TP
+.B ENOTDIR
+A component of the path prefix of a path argument was not a directory.
+.TP
+.B EOPNOTSUPP
+The command given by
+.I cmd
+is not valid.
+.TP
+.B ENXIO
+The major number of a block device parameter is out of range.
+.TP
+.B EPERM
+The command given by
+.I cmd
+was
+.BR FSCONFIG_CMD_CREATE ", " FSCONFIG_CMD_CREATE_EXCL ", or " \
+FSCONFIG_CMD_RECONFIGURE
+but the calling process does not have the required
+.B CAP_SYS_ADMIN
+capability.
+.SH STANDARDS
+Linux.
+.SH HISTORY
+Linux 5.2.
+.\" commit ecdab150fddb42fe6a739335257949220033b782
+Glibc 2.36.
+.SH EXAMPLES
+To illustrate the different kinds of flags that can be configured with
+.BR fsconfig (2),
+here are a few examples of some different filesystems being created:
+.P
+.in +4n
+.EX
+int sfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
+fsconfig(sfd, FSCONFIG_SET_FLAG, "inode64", NULL, 0);
+fsconfig(sfd, FSCONFIG_SET_STRING, "uid", "1234", 0);
+fsconfig(sfd, FSCONFIG_SET_STRING, "huge", "never", 0);
+fsconfig(sfd, FSCONFIG_SET_FLAG, "casefold", NULL, 0);
+fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+int mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NODEV | MOUNT_ATTR_NOEXEC);
+move_mount(mfd, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+int sfd = fsopen("erofs", FSOPEN_CLOEXEC);
+fsconfig(sfd, FSCONFIG_SET_STRING, "source", "/dev/loop0", 0);
+fsconfig(sfd, FSCONFIG_SET_FLAG, "acl", NULL, 0);
+fsconfig(sfd, FSCONFIG_SET_FLAG, "user_xattr", NULL, 0);
+fsconfig(sfd, FSCONFIG_CMD_CREATE_EXCL, NULL, NULL, 0);
+int mfd = fsmount(sfd, FSMOUNT_CLOEXEC, MOUNT_ATTR_NOSUID);
+move_mount(mfd, "", AT_FDCWD, "/mnt", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+int lowerdirfd = open("/overlays/mine/lower1", O_DIRECTORY | O_CLOEXEC);
+int sfd = fsopen("overlay", FSOPEN_CLOEXEC);
+/* "lowerdir+" appends to the lower dir stack */
+fsconfig(sfd, FSCONFIG_SET_FD, "lowerdir+", NULL, lowerdirfd);
+fsconfig(sfd, FSCONFIG_SET_STRING, "lowerdir+", "/overlays/mine/lower2", 0);
+.\" fsconfig(sfd, FSCONFIG_SET_PATH, "lowerdir+", "/overlays/mine/lower3", AT_FDCWD);
+.\" fsconfig(sfd, FSCONFIG_SET_PATH_EMPTY, "lowerdir+", "", lowerdirfd);
+fsconfig(sfd, FSCONFIG_SET_STRING, "xino", "auto", 0);
+fsconfig(sfd, FSCONFIG_SET_STRING, "nfs_export", "off", 0);
+fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+int mfd = fsmount(sfd, FSMOUNT_CLOEXEC, 0);
+move_mount(mfd, "", AT_FDCWD, "/mnt/overlay", MOVE_MOUNT_F_EMPTY_PATH);
+\&
+int nsfd = open("/proc/self/ns/pid", O_PATH);
+int nsdirfd = open("/proc/1/ns", O_DIRECTORY);
+int sfd = fsopen("proc", FSOPEN_CLOEXEC);
+/* "pidns" changes the value each time. */
+fsconfig(sfd, FSCONFIG_SET_PATH, "pidns", "/proc/self/ns/pid", AT_FDCWD);
+fsconfig(sfd, FSCONFIG_SET_PATH, "pidns", "pid", NULL, nsdirfd);
+fsconfig(sfd, FSCONFIG_SET_PATH_EMPTY, "pidns", "", nsfd);
+fsconfig(sfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd);
+fsconfig(sfd, FSCONFIG_CMD_CREATE, NULL, NULL, 0);
+int mfd = fsmount(sfd, FSMOUNT_CLOEXEC, 0);
+move_mount(mfd, "", AT_FDCWD, "/proc", MOVE_MOUNT_F_EMPTY_PATH);
+.EE
+.in
+.P
+And here is an example of how
+.BR fspick (2)
+can be used with
+.BR fsconfig ()
+to reconfigure the parameters of an existing mount:
+.P
+.in +4n
+.EX
+int sfd = fspick(AT_FDCWD, "/proc", FSPICK_CLOEXEC);
+fsconfig(sfd, FSCONFIG_SET_STRING, "hidepid", "ptraceable", 0);
+fsconfig(sfd, FSCONFIG_SET_STRING, "subset", "pid", 0);
+fsconfig(sfd, FSCONFIG_CMD_RECONFIGURE, NULL, NULL, 0);
+.EE
+.in
+.SH SEE ALSO
+.BR mount (2),
+.BR fsopen (2),
+.BR fsmount (2),
+.BR fspick (2),
+.BR open_tree (2),
+.BR move_mount (2),
+.BR mount_setattr (2),
+.BR mount_namespaces (7)
+
-- 
2.50.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help