Re: [RFC][PATCH] fanotify: allow setting FAN_CREATE in mount mark mask

From: Amir Goldstein <amir73il@gmail.com>
Date: 2021-03-31 11:31:24
Also in: linux-fsdevel

On Wed, Mar 31, 2021 at 12:46 PM Christian Brauner
[off-list ref] wrote:

On Tue, Mar 30, 2021 at 05:56:25PM +0300, Amir Goldstein wrote:

quoted

My example probably would be something like:

mount -t ext4 /dev/sdb /A

1. FAN_MARK_MOUNT(/A)

mount --bind /A /B

2. FAN_MARK_MOUNT(/B)

mount -t ecryptfs /B /C

3. FAN_MARK_MOUNT(/C)

let's say I now do

touch /C/bla

I may be way off here but intuitively it seems both 1. and 2. should get
a creation event but not 3., right?

Why not 3?
You explicitly set a mark on /C requesting to be notified when
objects are created via /C.

Sorry, that was a typo. I meant to write, both 2. and 3. should get a
creation event but not 1.

quoted

But with your proposal would both 1. and 2. still get a creation event?

Same obvious typo. The correct question would be: with your proposal do
2. and 3. both get an event?

Because it feels like they both should since /C is mounted on top of /B
and ecryptfs acts as a shim. Both FAN_MARK_MOUNT(/B) and
FAN_MARK_MOUNT(/C) should get a creation event after all both will have
mnt->mnt_fsnotify_marks set.

Right.

There are two ways to address this inconsistency:
1. Change internal callers of vfs_ helpers to use a private mount,
    as you yourself suggested for ecryptfs and cachefiles

I feel like this is he correct thing to do independently of the fanotify
considerations. I think I'll send an RFC for this today or later this
week.

quoted

2. Add fsnotify_path_ hooks at caller site - that would be the
    correct thing to do for nfsd IMO

I do not have an informed opinion yet on nfsd so I simply need to trust
you here. :)

As long as "exp_export: export of idmapped mounts not yet supported.\n"
I don't think it matters much.
It feels like adding idmapped mounts to nfsd is on your roadmap.
When you get to that we can discuss adding fsnotify path hooks to nfsd
if Jan agrees to the fsnotify path hooks concept.

quoted

They would not get an event, because fsnotify() looks for CREATE event
subscribers on inode->i_fsnotify_marks and inode->i_sb_s_fsnotify_marks
and does not find any.

Well yes, but my example has FAN_MARK_MOUNT(/B) set. So fanotify
_should_ look at
            (!mnt || !mnt->mnt_fsnotify_marks) &&
and see that there are subscribers and should notify the subscribers in
/B even if the file is created through /C.

My point is with your solution this can't be handled and I want to make
sure that this is ok. Because right now you'd not be notified about a
new file having been created in /B even though mnt->mnt_fsnotify_marks
is set and the creation went through /B via /C.

If you are referring to the ecryptfs use case specifically, then I think it is
ok. After all, whether ecryptfs uses a private mount clone or not is not
something the user can know.

quoted

_Unless_ we switch to an argument like overlayfs and say "This is a
private mount which is opaque and so we don't need to generate events.".
Overlayfs handles this cleanly due to clone_private_mount() which will
shed all mnt->mnt_fsnotify_marks and ecryptfs should too if that is the
argument we follow, no?

There is simply no way that the user can infer from the documentation
of FAN_MARK_MOUNT that the event on /B is expected when /B is
underlying layer of ecryptfs or overlayfs.
It requires deep internal knowledge of the stacked fs implementation.
In best case, the user can infer that she MAY get an event on /B.
Some users MAY also expect to get an event on /A because they do not
understand the concept of bind mounts...
Clone a mount ns and you will get more lost users...

I disagree to some extent. For example, a user might remount /B
read-only at which point /C is effectively read-only too which makes it
plain obvious to the user that /C piggy-backs on /B.

Yes, but that is a bug. /C should not become read-only. It should use
a private clone of /B, so I don't see where this is going.

But leaving that aside my questioning is more concerned with whether the
implementation we're aiming for is consistent and intuitive and that
stacking example came to my mind pretty quickly.

This implementation is a compromise for not having clear user mount
context in all places that call for an event.
For every person you find that thinks it is intuitive to get an event on /B
for touch C/bla, you will find another person that thinks it is not intuitive
to get an event. I think we are way beyond the stage with mount
namespaces where intuition alone suffice.

w.r.t consistent, you gave a few examples and I suggested how IMO
they should be fixed to behave consistently.
If you have other examples of alleged inconsistencies, please list them.

quoted

The vfs_create() -> fsnotify_create() hook passes data_type inode to
fsnotify() so there is no fsnotify_data_path() to extract mnt event
subscribers from.

Right, that was my point. You don't have the mnt context for the
underlying fs at a time when e.g. call vfs_link() which ultimately calls
fsnotify_create/link() which I'm saying might be a problem.

It's a problem. If it wasn't a problem I wouldn't need to work around it ;-)

It would be a problem if people think that the FAN_MOUNT_MARK
is a subtree mark, which it certainly is not. And I have no doubt that

I don't think subtree marks figure into the example above. But we
digress.

quoted

as Jan said, people really do want a subtree mark.

My question to you with this RFC is: Does the ability to subscribe to
CREATE/DELETE/MOVE events on a mount help any of your use
cases? With or without the property that mount marks are allowed

Since I explicitly pointed on in a prior mail that it would be great to
have the same events as for a regular fanotify watch I think I already
answered that question. :)

quoted

inside userns for idmapped mounts.

But if it helps then I'll do it once: yes, both would indeed be very
useful.

OK. I understand that the "promise" of those abilities is very useful.
Please also confirm once you tested the demo code that the new
events on an idmapped mount will "actually" be useful to container
managers. If you can work my demo code into a demo branch for
the bind mount injection or something that would be best.

The reason I am asking this is because while I was working on
enabling sb/mount marks inside userns I found several other issues
(e.g. open_by_handle_at()) without fixing them the demo would have
been much less impressive and much less useful in practice.

So I would like to know that we really have all the pieces needed for
a useful solution, before proposing the fanotify patches.

Thanks,
Amir.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help