Re: [RFC PATCH 2/3] add statmnt(2) syscall
From: Christian Brauner <brauner@kernel.org>
Date: 2023-09-18 16:20:36
Also in:
linux-fsdevel, linux-man, linux-security-module, lkml
Atomicity of getting a snapshot of the current mount tree with all of its attributes was never guaranteed, although reading /proc/self/mountinfo into a sufficiently large buffer would work that way. However, I don't see why mount trees would require stronger guarantees than dentry trees (for which we have basically none).
So atomicity was never put forward as a requirement. In that session/recording I explicitly state that we won't guarantee atomicity. And systemd agreed with this. So I think we're all on the same page.
Even more type clean interface: struct statmnt *statmnt(u64 mnt_id, u64 mask, void *buf, size_t bufsize, unsigned int flags); Kernel would return a fully initialized struct with the numeric as well as string fields filled. That part is trivial for userspace to deal with.
I really would prefer a properly typed struct and that's what everyone was happy with in the session as well. So I would not like to change the main parameters.
quoted
Plus, the format for how to return arbitrary filesystem mount options warrants a separate discussion imho as that's not really vfs level information.Okay. Let's take fs options out of this.
Thanks.
That leaves: - fs type and optionally subtype
So since subtype is FUSE specific it might be better to move this to filesystem specific options imho.
- root of mount within fs - mountpoint path The type and subtype are naturally limited to sane sizes, those are not an issue.
What's the limit for fstype actually? I don't think there is one.
There's one by chance but not by design afaict?
Maybe crazy idea:
That magic number thing that we do in include/uapi/linux/magic.h
is there a good reason for this or why don't we just add a proper,
simple enum:
enum {
FS_TYPE_ADFS 1
FS_TYPE_AFFS 2
FS_TYPE_AFS 3
FS_TYPE_AUTOFS 4
FS_TYPE_EXT2 5
FS_TYPE_EXT3 6
FS_TYPE_EXT4 7
.
.
.
FS_TYPE_MAX
}
that we start returning from statmount(). We can still return both the
old and the new fstype? It always felt a bit odd that fs developers to
just select a magic number.
For paths the evolution of the relevant system/library calls was: char *getwd(char buf[PATH_MAX]); char *getcwd(char *buf, size_t size); char *get_current_dir_name(void); It started out using a fixed size buffer, then a variable sized buffer, then an automatically allocated buffer by the library, hiding the need to resize on overflow. The latest style is suitable for the statmnt() call as well, if we worry about pleasantness of the API.
So, can we then do the following struct:
struct statmnt {
__u64 mask; /* What results were written [uncond] */
__u32 sb_dev_major; /* Device ID */
__u32 sb_dev_minor;
__u64 sb_magic; /* ..._SUPER_MAGIC */
__u32 sb_flags; /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
__u32 __spare1;
__u64 mnt_id; /* Unique ID of mount */
__u64 mnt_parent_id; /* Unique ID of parent (for root == mnt_id) */
__u32 mnt_id_old; /* Reused IDs used in proc/.../mountinfo */
__u32 mnt_parent_id_old;
__u64 mnt_attr; /* MOUNT_ATTR_... */
__u64 mnt_propagation; /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
__u64 mnt_peer_group; /* ID of shared peer group */
__u64 mnt_master; /* Mount receives propagation from this ID */
__u64 propagate_from; /* Propagation from in current namespace */
__aligned_u64 mountpoint;
__u32 mountpoint_len;
__aligned_u64 mountroot;
__u32 mountroot_len;
__u64 __spare[20];
};
Userspace knows already how to deal with that because of bpf and other
structs (e.g., both systemd and LXC have ptr_to_u64() helpers and so
on). Libmount and glibc can hide this away internally as well.