Re: [RFC PATCH] getting misc stats/attributes via xattr API

From: Ian Kent <raven@themaw.net>
Date: 2022-05-10 04:29:16
Also in: linux-fsdevel, linux-man, linux-security-module, lkml

On Tue, 2022-05-10 at 05:49 +0200, Miklos Szeredi wrote:

On Mon, 9 May 2022 at 14:48, Christian Brauner [off-list ref]
wrote:

quoted

One comment about this. We really need to have this interface
support
giving us mount options like "relatime" back in numeric form (I
assume
this will be possible.). It is royally annoying having to maintain
a
mapping table in userspace just to do:

relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME
ro       -> MS_RDONLY/MOUNT_ATTR_RDONLY

A library shouldn't be required to use this interface. Conservative
low-level software that keeps its shared library dependencies
minimal
will need to be able to use that interface without having to go to
an
external library that transforms text-based output to binary form
(Which
I'm very sure will need to happen if we go with a text-based
interface.).

Agreed.

quoted

  This pattern of requesting the size first by passing empty
arguments,
  then allocating the buffer and then passing down that buffer to
  retrieve that value is really annoying to use and error prone (I
do
  of course understand why it exists.).

  For real xattrs it's not that bad because we can assume that
these
  values don't change often and so the race window between
  getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter.
But
  fwiw, the post > pre check doesn't exist for no reason; we do
indeed
  hit that race.

That code is wrong.  Changing xattr size is explicitly documented in
the man page as a non-error condition:

       If size is specified as zero, these calls return the  current 
size  of
       the  named extended attribute (and leave value unchanged). 
This can be
       used to determine the size of the buffer that should be
supplied  in  a
       subsequent  call.   (But, bear in mind that there is a
possibility that
       the attribute value may change between the two calls,  so 
that  it  is
       still necessary to check the return status from the second
call.)

quoted

  In addition, it is costly having to call getxattr() twice. Again,
for
  retrieving xattrs it often doesn't matter because it's not a
super
  common operation but for mount and other info it might matter.

You don't *have* to retrieve the size, it's perfectly valid to e.g.
start with a fixed buffer size and double the size until the result
fits.

quoted

* Would it be possible to support binary output with this
interface?
  I really think users would love to have an interfact where they
can
  get a struct with binary info back.

I think that's bad taste.   fsinfo(2) had the same issue.  As well as
mount(2) which still interprets the last argument as a binary blob in
certain cases (nfs is one I know of).

quoted

  Especially for some information at least. I'd really love to have
a
  way go get a struct mount_info or whatever back that gives me all
the
  details about a mount encompassed in a single struct.

If we want that, then can do a new syscall with that specific struct
as an argument.

quoted

  Callers like systemd will have to parse text and will end up
  converting everything from text into binary anyway; especially
for
  mount information. So giving them an option for this out of the
box
  would be quite good.

What exactly are the attributes that systemd requires?

It's been a while since I worked on this so my response might not
be too accurrate now.

Monitoring the mount table is used primarily to identify a mount
started and mount completion.

Mount table entry identification requires several fields.

But, in reality, once a direct interface is available it should be
possible to work out what is actually needed and that will be a
rather subset of a mountinfo table entry.

quoted

  Interfaces like statx aim to be as fast as possible because we
exptect
  them to be called quite often. Retrieving mount info is quite
costly
  and is done quite often as well. Maybe not for all software but
for a
  lot of low-level software. Especially when starting services in
  systemd a lot of mount parsing happens similar when starting
  containers in runtimes.

Was there ever a test patch for systemd using fsinfo(2)?  I think
not.

Mmm ... I'm hurt you didn't pay any attention to what I did on this
during the original fsinfo() discussions.

Until systemd people start to reengineer the mount handing to allow
for retrieving a single mount instead of the complete mount table we
will never know where the performance bottleneck lies.

We didn't need the systemd people to do this only review and contribute
to the pr for the change and eventually merge it.

What I did on this showed that using fsinfo() allone about halved the
CPU overhead (from around 4 processes using about 80%) and once the
mount notifications was added too it went down to well under 10% per
process. The problem here was systemd is quite good at servicing events
and reducing event processing overhead meant more events would then be
processed. Utilizing the mount notifications queueing was the key to
improving this and that was what I was about to work on at the end.

But everything stopped before the work was complete.

As I said above it's been a long time since I looked at the systemd
work and it definitely was a WIP so "what you see is what you get"
at https://github.com/raven-au/systemd/commits/. It looks like the
place to look to get some idea of what was being done is branch
notifications-devel or notifications-rfc-pr. Also note that this
uses the libmount fsinfo() infrastrucure that was done by Karal Zak
(and a tiny bit by me) at the time.

quoted

* If we decide to go forward with this interface - and I think I
  mentioned this in the lsfmm session - could we please at least
add a
  new system call? It really feels wrong to retrieve mount and
other
  information through the xattr interfaces. They aren't really
xattrs.

I'd argue with that statement.  These are most definitely attributes.
As for being extended, we'd just extended the xattr interface...

Naming aside... imagine that read(2) has always been used to retrieve
disk data, would you say that reading data from proc feels wrong?
And in hindsight, would a new syscall for the purpose make any sense?

Thanks,
Miklos

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help