Thread (33 messages) 33 messages, 3 authors, 2021-03-28

Re: [PATCH v2 0/2] unprivileged fanotify listener

From: Amir Goldstein <amir73il@gmail.com>
Date: 2021-03-24 15:06:47
Also in: linux-fsdevel

On Wed, Mar 24, 2021 at 4:32 PM Christian Brauner
[off-list ref] wrote:
On Wed, Mar 24, 2021 at 03:57:12PM +0200, Amir Goldstein wrote:
quoted
quoted
quoted
Now tested FAN_MARK_FILESYSTEM watch on tmpfs mounted
inside userns and works fine, with two wrinkles I needed to iron:

1. FAN_REPORT_FID not supported on tmpfs because tmpfs has
    zero f_fsid (easy to fix)
2. open_by_handle_at() is not userns aware (can relax for
    FS_USERNS_MOUNT fs)

Pushed these two fixes to branch fanotify_userns.
Pushed another fix to mnt refcount bug in WIP and another commit to
add the last piece that could make fanotify usable for systemd-homed
setup - a filesystem watch filtered by mnt_userns (not tested yet).
Now I used mount-idmapped (from xfstest) to test that last piece.
Found a minor bug and pushed a fix.

It is working as expected, that is filtering only the events generated via
the idmapped mount. However, because the listener I tested is capable in
the mapped userns and not in the sb userns, the listener cannot
open_ny_handle_at(), so the result is not as useful as one might hope.
This is another dumb question probably but in general, are you saying
that someone watching a mount or directory and does _not_ want file
descriptors from fanotify to be returned has no other way of getting to
the path they want to open other than by using open_by_handle_at()?
Well there is another way.
It is demonstrated in my demo with intoifywatch --fanotify --recursive.
It involved userspace iterating a subtree of interest to create fid->path
map.

The fanotify recursive watch is similar but not exactly the same as the
old intoify recursive watch, because with inotify recursive watch you
can miss events.

With fanotify recursive watch, the listener (if capable) can setup a
filesystem mark so events will not be missed. They will be recorded
by fid with an unknown path and the path information can be found later
by the crawler and updated in the map before the final report.

Events on fid that were not found by the crawler need not be reported.
That's essentially a subtree watch for the poor implemented in userspace.

I did not implement the combination --fanotify --global --recursive in my
demo, because it did not make sense with the current permission model
(listener that can setup a fs mark can always resolve fids to path), but it
would be quite trivial to add.

quoted
I guess we will also need to make open_by_handle_at() idmapped aware
and use a variant of vfs_dentry_acceptable() that validates that the opened
path is legitimately accessible via the idmapped mount.
So as a first step, I think there's a legitimate case to be made for
open_by_handle_at() to be made useable inside user namespaces. That's a
change worth to be made independent of fanotify. For example, nowadays
cgroups have a 64 bit identifier that can be used with open_by_handle_at
to map a cgrp id to a path and back:
https://lkml.org/lkml/2020/12/2/1126
Right now this can't be used in user namespaces because of this
restriction but it is genuinely useful to have this feature available
since cgroups are FS_USERNS_MOUNT and that identifier <-> path mapping
is very convenient.
FS_USERNS_MOUNT is a simple case and I think it is safe.
There is already a patch for that on my fanotify_userns branch.
Without looking at the code I'm not super sure how name_to_handle_at()
and open_by_handle_at() behave in the face of mount namespaces so that
would need looking into to. But it would be a genuinely useful change, I
think.
name_to_handle_at()/open_by_handle_at() should be indifferent to mount ns,
because the former returns mount_id which is globally unique (I think)
and the latter takes a mount_fd argument, which means the caller needs to
prove access to the mount prior to getting an open fd in that mount.

Thanks,
Amir.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help