Re: [RFC PATCH] getvalues(2) prototype
From: Dave Chinner <david@fromorbit.com>
Date: 2022-03-23 22:58:55
Also in:
linux-api, linux-fsdevel, linux-man, lkml
On Tue, Mar 22, 2022 at 08:27:12PM +0100, Miklos Szeredi wrote:
Add a new userspace API that allows getting multiple short values in a single syscall. This would be useful for the following reasons: - Calling open/read/close for many small files is inefficient. E.g. on my desktop invoking lsof(1) results in ~60k open + read + close calls under /proc and 90% of those are 128 bytes or less.
How does doing the open/read/close in a single syscall make this any more efficient? All it saves is the overhead of a couple of syscalls, it doesn't reduce any of the setup or teardown overhead needed to read the data itself....
- Interfaces for getting various attributes and statistics are fragmented.
For files we have basic stat, statx, extended attributes, file attributes
(for which there are two overlapping ioctl interfaces). For mounts and
superblocks we have stat*fs as well as /proc/$PID/{mountinfo,mountstats}.
The latter also has the problem on not allowing queries on a specific
mount.https://xkcd.com/927/
- Some attributes are cheap to generate, some are expensive. Allowing
userspace to select which ones it needs should allow optimizing queries.
- Adding an ascii namespace should allow easy extension and self
description.
- The values can be text or binary, whichever is fits best.
The interface definition is:
struct name_val {
const char *name; /* in */
struct iovec value_in; /* in */
struct iovec value_out; /* out */
uint32_t error; /* out */
uint32_t reserved;
};Ahhh, XFS_IOC_ATTRMULTI_BY_HANDLE reborn. This is how xfsdump gets and sets attributes efficiently when dumping and restoring files - it's an interface that allows batches of xattr operations to be run on a file in a single syscall. I've said in the past when discussing things like statx() that maybe everything should be addressable via the xattr namespace and set/queried via xattr names regardless of how the filesystem stores the data. The VFS/filesystem simply translates the name to the storage location of the information. It might be held in xattrs, but it could just be a flag bit in an inode field. Then we just get named xattrs in batches from an open fd.
int getvalues(int dfd, const char *path, struct name_val *vec, size_t num, unsigned int flags); @dfd and @path are used to lookup object $ORIGIN. @vec contains @num name/value descriptors. @flags contains lookup flags for @path. The syscall returns the number of values filled or an error. A single name/value descriptor has the following fields: @name describes the object whose value is to be returned. E.g. mnt - list of mount parameters mnt:mountpoint - the mountpoint of the mount of $ORIGIN mntns - list of mount ID's reachable from the current root mntns:21:parentid - parent ID of the mount with ID of 21 xattr:security.selinux - the security.selinux extended attribute data:foo/bar - the data contained in file $ORIGIN/foo/bar
How are these different from just declaring new xattr namespaces for these things. e.g. open any file and list the xattrs in the xattr:mount.mnt namespace to get the list of mount parameters for that mount. Why do we need a new "xattr in everything but name" interface when we could just extend the one we've already got and formalise a new, cleaner version of xattr batch APIs that have been around for 20-odd years already? Cheers, Dave. -- Dave Chinner david@fromorbit.com