Re: [PATCH v4 0/3] fs: introduce getfsxattrat and setfsxattrat syscalls

From: Amir Goldstein <amir73il@gmail.com>
Date: 2025-03-28 14:09:19
Also in: linux-alpha, linux-api, linux-arch, linux-fsdevel, linux-m68k, linux-mips, linux-s390, linux-security-module, linux-sh, linux-xfs, lkml, selinux, sparclinux

On Thu, Mar 27, 2025 at 10:13 PM Pali Rohár [off-list ref] wrote:

On Thursday 27 March 2025 21:57:34 Amir Goldstein wrote:

quoted

On Thu, Mar 27, 2025 at 8:26 PM Pali Rohár [off-list ref] wrote:

quoted

On Thursday 27 March 2025 12:47:02 Amir Goldstein wrote:

quoted

On Sun, Mar 23, 2025 at 11:32 AM Pali Rohár [off-list ref] wrote:

quoted

On Sunday 23 March 2025 09:45:06 Amir Goldstein wrote:

quoted

On Fri, Mar 21, 2025 at 8:50 PM Andrey Albershteyn [off-list ref] wrote:

quoted

This patchset introduced two new syscalls getfsxattrat() and
setfsxattrat(). These syscalls are similar to FS_IOC_FSSETXATTR ioctl()
except they use *at() semantics. Therefore, there's no need to open the
file to get an fd.

These syscalls allow userspace to set filesystem inode attributes on
special files. One of the usage examples is XFS quota projects.

XFS has project quotas which could be attached to a directory. All
new inodes in these directories inherit project ID set on parent
directory.

The project is created from userspace by opening and calling
FS_IOC_FSSETXATTR on each inode. This is not possible for special
files such as FIFO, SOCK, BLK etc. Therefore, some inodes are left
with empty project ID. Those inodes then are not shown in the quota
accounting but still exist in the directory. This is not critical but in
the case when special files are created in the directory with already
existing project quota, these new inodes inherit extended attributes.
This creates a mix of special files with and without attributes.
Moreover, special files with attributes don't have a possibility to
become clear or change the attributes. This, in turn, prevents userspace
from re-creating quota project on these existing files.

Christian, if this get in some mergeable state, please don't merge it
yet. Amir suggested these syscalls better to use updated struct fsxattr
with masking from Pali Rohár patchset, so, let's see how it goes.

Andrey,

To be honest I don't think it would be fair to delay your syscalls more
than needed.

I agree.

quoted

If Pali can follow through and post patches on top of your syscalls for
next merge window that would be great, but otherwise, I think the
minimum requirement is that the syscalls return EINVAL if fsx_pad
is not zero. we can take it from there later.

IMHO SYS_getfsxattrat is fine in this form.

For SYS_setfsxattrat I think there are needed some modifications
otherwise we would have problem again with backward compatibility as
is with ioctl if the syscall wants to be extended in future.

I would suggest for following modifications for SYS_setfsxattrat:

- return EINVAL if fsx_xflags contains some reserved or unsupported flag

- add some flag to completely ignore fsx_extsize, fsx_projid, and
  fsx_cowextsize fields, so SYS_setfsxattrat could be used just to
  change fsx_xflags, and so could be used without the preceding
  SYS_getfsxattrat call.

What do you think about it?

I think all Andrey needs to do now is return -EINVAL if fsx_pad is not zero.

You can use this later to extend for the semantics of flags/fields mask
and we can have a long discussion later on what this semantics should be.

Right?

Amir.

It is really enough?

I don't know. Let's see...

quoted

All new extensions later would have to be added
into fsx_pad fields, and currently unused bits in fsx_xflags would be
unusable for extensions.

I am working under the assumption that the first extension would be
to support fsx_xflags_mask and from there, you could add filesystem
flags support checks and then new flags. Am I wrong?

Obviously, fsx_xflags_mask would be taken from fsx_pad space.
After that extension is implemented, calling SYS_setfsxattrat() with
a zero fsx_xflags_mask would be silly for programs that do not do
the legacy get+set.

So when we introduce  fsx_xflags_mask, we could say that a value
of zero means that the mask is not being checked at all and unknown
flags in set syscall are ignored (a.k.a legacy ioctl behavior).

Programs that actually want to try and set without get will have to set
a non zero fsx_xflags_mask to do something useful.

Here we need to also solve the problem that without GET call we do not
have valid values for fsx_extsize, fsx_projid, and fsx_cowextsize. So
maybe we would need some flag in fsx_pad that fsx_extsize, fsx_projid,
or fsx_cowextsize are ignored/masked.

quoted

I don't think this is great.
I would rather that the first version of syscalls will require the mask
and will always enforce filesystems supported flags.

It is not great... But what about this? In a first step (part of this
syscall patch series) would be just a check that fsx_pad is zero.
Non-zero will return -EINVAL.

In next changes would added fsx_filter bit field, which for each
fsx_xflags and also for fsx_extsize, fsx_projid, and fsx_cowextsize
fields would add a new bit flag which would say (when SET) that the
particular thing has to be ignored.

1. I don't like the inverse mask. statx already has the stx_mask
    and stx_attributes_mask, so I rather stick to same semantics
    because some of those attributes are exposed via statx as well
2. fsx_*extsize already have a bit that says if that the particular
    attribute is valid or not, so setting a zero fsx_cowextsize with the
    flag FS_XFLAG_COWEXTSIZE has no effect in xfs:

        /*
         * Only set the extent size hint if we've already determined that the
         * extent size hint should be set on the inode. If no extent size flags
         * are set on the inode then unconditionally clear the extent size hint.
         */
        if (ip->i_diflags & (XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT))
                ip->i_extsize = XFS_B_TO_FSB(mp, fa->fsx_extsize);
        else
                ip->i_extsize = 0;

        if (xfs_has_v3inodes(mp)) {
                if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE)
                        ip->i_cowextsize = XFS_B_TO_FSB(mp, fa->fsx_cowextsize);
                else
                        ip->i_cowextsize = 0;
        }

I think we need to enforce this logic in fileattr_set_prepare()
and I think we need to add a flag FS_XFLAG_PROJID
that will be set in GET when fsx_projid != 0 and similarly
required when setting fsx_projid != 0.

Probably will need to add some backward compat glue for this
flag in GET ioctl to avoid breaking out of tree fs and fuse.

So when fsx_pad is all-zeros then fsx_filter (first field in fsx_pad)
would say that nothing in fsx_xflags, fsx_extsize, fsx_projid, and
fsx_cowextsize is ignored, and hence behave like before.

And when something in fsx_pad/fsx_filter is set then it says which
fields are ignored/filtered-out.

quoted

If you can get those patches (on top of current series) posted and
reviewed in time for the next merge window, including consensus
on the actual semantics, that would be the best IMO.

I think that this starting to be more complicated to rebase my patches
in a way that they do not affect IOCTL path but implement it properly
for new syscall path. It does not sounds like a trivial thing which I
would finish in merge window time and having proper review and consensus
on this.

Yes, it is better to separate the two efforts.

wrt erroring on unsupported SET flags, all fs other than xfs already
have some variant of fileattr_has_fsx(), so xfs is the only filesystem
that requires special care with the new syscalls.
It's easier to write a patch than it is to explain what I mean, so
I'll try to write a patch.

Thanks,
Amir.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help