Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support
From: Konstantin Khlebnikov <hidden>
Date: 2015-02-05 09:32:10
Also in:
linux-ext4, linux-fsdevel
Possibly related (same subject, not in this thread)
- 2015-02-05 · Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · Dave Chinner <david@fromorbit.com>
- 2015-02-05 · Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · Jan Kara <hidden>
- 2015-02-04 · Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · Konstantin Khlebnikov <hidden>
- 2015-01-28 · Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · Andy Lutomirski <hidden>
- 2015-01-28 · Re: [v8 4/5] ext4: adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface support · Dave Chinner <david@fromorbit.com>
On 05.02.2015 01:58, Dave Chinner wrote:
On Wed, Feb 04, 2015 at 06:22:01PM +0300, Konstantin Khlebnikov wrote:quoted
On 28.01.2015 03:37, Dave Chinner wrote:quoted
On Tue, Jan 27, 2015 at 01:45:17PM +0300, Konstantin Khlebnikov wrote:quoted
On 27.01.2015 11:02, Dave Chinner wrote:quoted
On Fri, Jan 23, 2015 at 03:59:04PM -0800, Andy Lutomirski wrote:quoted
On Fri, Jan 23, 2015 at 3:30 PM, Dave Chinner [off-list ref] wrote:quoted
On Fri, Jan 23, 2015 at 02:58:09PM +0300, Konstantin Khlebnikov wrote:I think I must be missing something simple here. In a hypothetical world where the code used nsown_capable, if an admin wants to stick a container in /mnt/container1 with associated prid 1 and a userns, shouldn't it just map only prid 1 into the user ns? Then a user in that userns can't try to change the prid of a file to 2 because the number "2" is unmapped for that user and translation will fail.You've effectively said "yes, project quotas are enabled, but you only have a single ID, it's always turned on and you can't change it to anything else. So, why do they need to be mapped via user namespaces to enable this? Think about it a little harder: - Project IDs are not user IDs. - Project IDs are not a security/permission mechanism.First, I'll just point this out again...
Ok, I get it.
quoted
quoted
quoted
This might be useful even without containers : normal user quota has two levels and admins might classify users into groups and set group quota for them. Project quota is flat and cannot provide any control if we want classify projects.I don't follow. project ID is exactly what allows you to control project classification.I mean hierarchy allows to group several projects into one super-project which sums all disk usage and could have its own limit too.Yes, I know, but you can also do this resource management from userspace with the existing project quota tools. It's just a matter of layering heirarchical limit management on top of the existing infrastructure.
Yes but not in all cases: it's impossible to overcommit disk limits on project level without overcommiting on super-project level. Hierarchical quotas can handle this [ hypothetically useful ] use case.
quoted
For now I'm more interested in participation disk space among services in one system. As I see security model of project quota in XFS almost non-existent for this case: it forbids linking/renaming files between different projects but any unprivileged user might change project id for its own files. That's strange, this operation should be privileged.<sigh> It's clear you don't understand the design/architecture of project quotas. You've clearly read the code, but you haven't understood the design that lead to the specific implementation in XFS. Users have *always* been allowed to set the project ID of their own files. How else are they going to set the project ID on files they create in random directories so to account them to the correct project they are working on?
In this case project disk limits are almost useless and even dangerous because any unprivileged user could add files into limited project witch belongs to other user.
However, you keep making the assumption that project quotas == directory subtree quotas. Project quotas are *not limited* to directory subtrees - the subtree quota implementation is just an implementation that *sets the default project ID* on files as they are created. e.g. there are production systems out there where project quotas are used to track home directory space usage rather than user quotas. This means users can take actions like "this file actually belongs to project X and it shouldn't be accounted against my home directory". Users can create their own sub directories that account everything by default to project X rather than their own home directory. Again: project quotas are an *accounting* mechanism, not a security mechanism. Containers are *security mechanism* and hence we need a security model for container resource controller mechanisms. Project quotas do not provide a directory heirarchy access security model - that's what we use mount namespaces for. The resource controller security model only has to prevent users inside the container from subverting the resource controller mechanism, not anything else. Not surprisingly, we've implemented *exactly* the model you are suggesting: that modification of the resource accounting mechanism is a privileged operation that cannot be accessed from within the container. i.e. inside a userns container you can't change the project ID on a file, not even as root.quoted
Also if user have permission for changing project id he could be permitted to link and rename file into directory with any project id, because he anyway could change project, move, and revert it back.You don't appear to understand why XFS forbids linking/renaming across directories different project IDs. Hint: it's resource accounting simplification, *not a security mechanism*. Linking is obvious: you can't have the same inode accounted to multiple projects - it belongs to a single project and so can't be accounted to multiple projects. Hence if you want to link across different directory-based project quotas, you have to use symlinks. That's much simpler than having to decide what project the inode is accounted to, especially when removing links and link that owns the project ID is removed. How do you even know the link you are removing is the last link in the current project? IOWs, you have to search for the other owners of the inode to determine who the project quota is now accounted to...
But you have to search hardlinks everywhere (inode owner can hardlink it into any directory where he has write access because project can be changed temporary). And after that you have to search broken symlinks. Also symlinks cannot share file between isolated containers which run in chroot while creating hardlinks is still possible but requires some extra steps like changing project id or creating temporary directories even if you're root. Not so useful too. Probably that's the reason why this feature seems never been implemented anywhere except xfs. Could we change that? For example by adding flag into quota-info block which makes project id more restrictive and useful?
Same for rename: there are a multitude of nasty corner cases when it comes to accounting the quotas correctly. So, either we try to do something complex and likely expensive and buggy, or we can return EXDEV. EXDEV was very carefully chosen here, and it's not for security reasons. It was chosen because applications know that if a rename returns EXDEV, they've got to *copy* the file instead. And, well, that create/write/unlink process results in correct project quota accounting at both the source and destination. IOWs: EXDEV not a security mechanism, it's an accounting mechanism. If you can implement project quota rename accounting and handle the multiple handlinks problem efficiently, then you can allow those things to be done directly in the filesystem rather than returning EXDEV.quoted
For me perfect interface looks like couple fcntls for getting/changing project id: int fcntl(fd, F_GET_PROJECT, projid_t *); int fcntl(fd, F_SET_PROJECT, projid_t); F_GET_PROJECT is allowed for everybody F_SET_PROJECT requires CAP_SYS_ADMIN (or maybe CAP_FOWNER?)Sure, it's nice, but you're ignoring the entire the point of making FS_IOC_SETXATTR generic: so that the *existing tools* that manage project quotas work on all project quota enabled filesystems. i.e. so that all filesystems *behave the same* and can *run identical regression tests*.
As i see quota tools in xfsprogs checks file-system name and doesn't work for anything except "xfs", so we have to patch it anywas. xfstests are cool but I think fixing one ioctl isn't a problem. Something else?
We do not want different project quota implementations on different filesystems. Like user and group quotas, they need to be consistently implemented across all filesystems. If you want something new, different and incompatible with existing infrastructure, then that's a separate line of development and discussion.... Cheers, Dave.
-- Konstantin