Thread (26 messages) 26 messages, 4 authors, 2024-11-19

Re: [RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler

From: Alexei Starovoitov <hidden>
Date: 2024-11-15 20:04:15
Also in: bpf, linux-fsdevel, lkml

On Thu, Nov 14, 2024 at 11:26 PM Amir Goldstein [off-list ref] wrote:
On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov
[off-list ref] wrote:
quoted
On Thu, Nov 14, 2024 at 12:44 AM Song Liu [off-list ref] wrote:
quoted
+
+       if (bpf_is_subdir(dentry, v->dentry))
+               ret = FAN_FP_RET_SEND_TO_USERSPACE;
+       else
+               ret = FAN_FP_RET_SKIP_EVENT;
It seems to me that all these patches and feature additions
to fanotify, new kfuncs, etc are done just to do the above
filtering by subdir ?

If so, just hard code this logic as an extra flag to fanotify ?
So it can filter all events by subdir.
bpf programmability makes sense when it needs to express
user space policy. Here it's just a filter by subdir.
bpf hammer doesn't look like the right tool for this use case.
Good question.

Speaking as someone who has made several attempts to design
efficient subtree filtering in fanotify, it is not as easy as it sounds.

I recently implemented a method that could be used for "practical"
subdir filtering in userspace, not before Jan has questioned if we
should go directly to subtree filtering with bpf [1].

This is not the only filter that was proposed for fanotify, where bpf
filter came as an alternative proposal [2], but subtree filtering is by far
the most wanted filter.
Interesting :)
Thanks for the links. Long threads back then.
Looks like bpf filtering was proposed 2 and 4 years ago and
Song's current attempt is the first real prototype that actually
implements what was discussed for so long?
Cool. I guess it has legs then.
The problem with implementing a naive is_subtree() filter in fanotify
is the unbounded cost to be paid by every user for every fs access
when M such filters are installed deep in the fs tree.
Agree that the concern is real.
I just don't see how filtering in bpf vs filtering in the kernel
is going to be any different.
is_subdir() is fast, so either kernel or bpf prog calling it
on every file_open doesn't seem like a big deal.
Are you saying that 'naive is_subdir' would call is_subdir
M times for every such filter ?
I guess it can be optimized. When these M filters are installed
the kernel can check whether they're subdirs of each other
and filter a subset of M filters with a single is_subdir() call.
Same logic can be in either bpf prog or in the kernel.
Making this more efficient then becomes a matter of trading of
memory (inode/path cache size) and performance and depends
on the size and depth of the watched filesystem.
This engineering decision *is* the userspace policy that can be
expressed by a bpf program.

As you may know, Linux is lagging behind Win and MacOS w.r.t
subtree filtering for fs events.

MacOS/FreeBSD took the userspace approach with fseventsd [3].
If you Google "fseventsd", you will get results with "High CPU and
Memory Usage" for as far as the browser can scroll.
Yeah. Fun.

Looking at your 2 year old attempt it seems the key part was to
make sure inodes are not pinned for filtering purpose?
How would you call is_subdir() then if parent dentry is not dget() ?
Or meaning of 'pinning' is different?
I hope the bpf-aided early subtree filtering technology would be
able to reduce some of this overhead to facilitate a better engineering
solution, but that remains to be proven...
Sure. Let's explore this further.
Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@quack3.lan/ (local)
[2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@quack2.suse.cz/ (local)
[3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help