Thread (69 messages) 69 messages, 13 authors, 2023-10-04

Re: [RFC PATCH 2/3] add statmnt(2) syscall

From: Miklos Szeredi <miklos@szeredi.hu>
Date: 2023-09-14 10:14:13
Also in: linux-fsdevel, linux-man, linux-security-module, lkml

On Thu, 14 Sept 2023 at 11:28, Christian Brauner [off-list ref] wrote:
On Wed, Sep 13, 2023 at 05:22:35PM +0200, Miklos Szeredi wrote:
quoted
Add a way to query attributes of a single mount instead of having to parse
the complete /proc/$PID/mountinfo, which might be huge.

Lookup the mount by the old (32bit) or new (64bit) mount ID.  If a mount
needs to be queried based on path, then statx(2) can be used to first query
the mount ID belonging to the path.

Design is based on a suggestion by Linus:

  "So I'd suggest something that is very much like "statfsat()", which gets
   a buffer and a length, and returns an extended "struct statfs" *AND*
   just a string description at the end."
So what we agreed to at LSFMM was that we split filesystem option
retrieval into a separate system call and just have a very focused
statx() for mounts with just binary and non-variable sized information.
We even gave David a hard time about this. :) I would really love if we
could stick to that.

Linus, I realize this was your suggestion a long time ago but I would
really like us to avoid structs with variable sized fields at the end of
a struct. That's just so painful for userspace and universally disliked.
If you care I can even find the LSFMM video where we have users of that
api requesting that we please don't do this. So it'd be great if you
wouldn't insist on it.
I completely missed that.

What I'm thinking is making it even simpler for userspace:

struct statmnt {
  ...
  char *mnt_root;
  char *mountpoint;
  char *fs_type;
  u32 num_opts;
  char *opts;
};

I'd still just keep options nul delimited.

Is there a good reason not to return pointers (pointing to within the
supplied buffer obviously) to userspace?
This will also allow us to turn statmnt() into an extensible argument
system call versioned by size just like we do any new system calls with
struct arguments (e.g., mount_setattr(), clone3(), openat2() and so on).
Which is how we should do things like that.
The mask mechanism also allow versioning of the struct.
Other than that I really think this is on track for what we ultimately
want.
quoted
+struct stmt_str {
+     __u32 off;
+     __u32 len;
+};
+
+struct statmnt {
+     __u64 mask;             /* What results were written [uncond] */
+     __u32 sb_dev_major;     /* Device ID */
+     __u32 sb_dev_minor;
+     __u64 sb_magic;         /* ..._SUPER_MAGIC */
+     __u32 sb_flags;         /* MS_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} */
+     __u32 __spare1;
+     __u64 mnt_id;           /* Unique ID of mount */
+     __u64 mnt_parent_id;    /* Unique ID of parent (for root == mnt_id) */
+     __u32 mnt_id_old;       /* Reused IDs used in proc/.../mountinfo */
+     __u32 mnt_parent_id_old;
+     __u64 mnt_attr;         /* MOUNT_ATTR_... */
+     __u64 mnt_propagation;  /* MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} */
+     __u64 mnt_peer_group;   /* ID of shared peer group */
+     __u64 mnt_master;       /* Mount receives propagation from this ID */
+     __u64 propagate_from;   /* Propagation from in current namespace */
+     __u64 __spare[20];
+     struct stmt_str mnt_root;       /* Root of mount relative to root of fs */
+     struct stmt_str mountpoint;     /* Mountpoint relative to root of process */
+     struct stmt_str fs_type;        /* Filesystem type[.subtype] */
I think if we want to do this here we should add:

__u64 fs_type
__u64 fs_subtype

fs_type can just be our filesystem magic number and we introduce magic
It's already there: sb_magic.

However it's not a 1:1 mapping (ext* only has one magic).
numbers for sub types as well. So we don't need to use strings here.
Ugh.

Thanks,
Miklos
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help