Re: [RFC][PATCH] fanotify: allow setting FAN_CREATE in mount mark mask
From: Amir Goldstein <amir73il@gmail.com>
Date: 2021-03-31 11:31:24
Also in:
linux-fsdevel
On Wed, Mar 31, 2021 at 12:46 PM Christian Brauner [off-list ref] wrote:
On Tue, Mar 30, 2021 at 05:56:25PM +0300, Amir Goldstein wrote:quoted
quoted
quoted
quoted
My example probably would be something like: mount -t ext4 /dev/sdb /A 1. FAN_MARK_MOUNT(/A) mount --bind /A /B 2. FAN_MARK_MOUNT(/B) mount -t ecryptfs /B /C 3. FAN_MARK_MOUNT(/C) let's say I now do touch /C/bla I may be way off here but intuitively it seems both 1. and 2. should get a creation event but not 3., right?Why not 3? You explicitly set a mark on /C requesting to be notified when objects are created via /C.Sorry, that was a typo. I meant to write, both 2. and 3. should get a creation event but not 1.quoted
quoted
But with your proposal would both 1. and 2. still get a creation event?Same obvious typo. The correct question would be: with your proposal do 2. and 3. both get an event? Because it feels like they both should since /C is mounted on top of /B and ecryptfs acts as a shim. Both FAN_MARK_MOUNT(/B) and FAN_MARK_MOUNT(/C) should get a creation event after all both will have mnt->mnt_fsnotify_marks set.Right. There are two ways to address this inconsistency: 1. Change internal callers of vfs_ helpers to use a private mount, as you yourself suggested for ecryptfs and cachefilesI feel like this is he correct thing to do independently of the fanotify considerations. I think I'll send an RFC for this today or later this week.quoted
2. Add fsnotify_path_ hooks at caller site - that would be the correct thing to do for nfsd IMOI do not have an informed opinion yet on nfsd so I simply need to trust you here. :)
As long as "exp_export: export of idmapped mounts not yet supported.\n" I don't think it matters much. It feels like adding idmapped mounts to nfsd is on your roadmap. When you get to that we can discuss adding fsnotify path hooks to nfsd if Jan agrees to the fsnotify path hooks concept.
quoted
quoted
quoted
They would not get an event, because fsnotify() looks for CREATE event subscribers on inode->i_fsnotify_marks and inode->i_sb_s_fsnotify_marks and does not find any.Well yes, but my example has FAN_MARK_MOUNT(/B) set. So fanotify _should_ look at (!mnt || !mnt->mnt_fsnotify_marks) && and see that there are subscribers and should notify the subscribers in /B even if the file is created through /C. My point is with your solution this can't be handled and I want to make sure that this is ok. Because right now you'd not be notified about a new file having been created in /B even though mnt->mnt_fsnotify_marks is set and the creation went through /B via /C.If you are referring to the ecryptfs use case specifically, then I think it is ok. After all, whether ecryptfs uses a private mount clone or not is not something the user can know.quoted
_Unless_ we switch to an argument like overlayfs and say "This is a private mount which is opaque and so we don't need to generate events.". Overlayfs handles this cleanly due to clone_private_mount() which will shed all mnt->mnt_fsnotify_marks and ecryptfs should too if that is the argument we follow, no?There is simply no way that the user can infer from the documentation of FAN_MARK_MOUNT that the event on /B is expected when /B is underlying layer of ecryptfs or overlayfs. It requires deep internal knowledge of the stacked fs implementation. In best case, the user can infer that she MAY get an event on /B. Some users MAY also expect to get an event on /A because they do not understand the concept of bind mounts... Clone a mount ns and you will get more lost users...I disagree to some extent. For example, a user might remount /B read-only at which point /C is effectively read-only too which makes it plain obvious to the user that /C piggy-backs on /B.
Yes, but that is a bug. /C should not become read-only. It should use a private clone of /B, so I don't see where this is going.
But leaving that aside my questioning is more concerned with whether the implementation we're aiming for is consistent and intuitive and that stacking example came to my mind pretty quickly.
This implementation is a compromise for not having clear user mount context in all places that call for an event. For every person you find that thinks it is intuitive to get an event on /B for touch C/bla, you will find another person that thinks it is not intuitive to get an event. I think we are way beyond the stage with mount namespaces where intuition alone suffice. w.r.t consistent, you gave a few examples and I suggested how IMO they should be fixed to behave consistently. If you have other examples of alleged inconsistencies, please list them.
quoted
quoted
quoted
The vfs_create() -> fsnotify_create() hook passes data_type inode to fsnotify() so there is no fsnotify_data_path() to extract mnt event subscribers from.Right, that was my point. You don't have the mnt context for the underlying fs at a time when e.g. call vfs_link() which ultimately calls fsnotify_create/link() which I'm saying might be a problem.It's a problem. If it wasn't a problem I wouldn't need to work around it ;-) It would be a problem if people think that the FAN_MOUNT_MARK is a subtree mark, which it certainly is not. And I have no doubt thatI don't think subtree marks figure into the example above. But we digress.quoted
as Jan said, people really do want a subtree mark. My question to you with this RFC is: Does the ability to subscribe to CREATE/DELETE/MOVE events on a mount help any of your use cases? With or without the property that mount marks are allowedSince I explicitly pointed on in a prior mail that it would be great to have the same events as for a regular fanotify watch I think I already answered that question. :)quoted
inside userns for idmapped mounts.But if it helps then I'll do it once: yes, both would indeed be very useful.
OK. I understand that the "promise" of those abilities is very useful. Please also confirm once you tested the demo code that the new events on an idmapped mount will "actually" be useful to container managers. If you can work my demo code into a demo branch for the bind mount injection or something that would be best. The reason I am asking this is because while I was working on enabling sb/mount marks inside userns I found several other issues (e.g. open_by_handle_at()) without fixing them the demo would have been much less impressive and much less useful in practice. So I would like to know that we really have all the pieces needed for a useful solution, before proposing the fanotify patches. Thanks, Amir.