Thread (50 messages) 50 messages, 9 authors, 2025-07-26

Re: [PATCH v5 bpf-next 0/5] bpf path iterator

From: Christian Brauner <brauner@kernel.org>
Date: 2025-07-07 10:46:48
Also in: bpf, linux-fsdevel, lkml

On Fri, Jun 27, 2025 at 08:51:21AM +1000, NeilBrown wrote:
On Fri, 27 Jun 2025, Song Liu wrote:
quoted
quoted
On Jun 26, 2025, at 3:22 AM, NeilBrown [off-list ref] wrote:
[...]
quoted
quoted
I guess I misunderstood the proposal of vfs_walk_ancestors() 
initially, so some clarification:

I think vfs_walk_ancestors() is good for the rcu-walk, and some 
rcu-then-ref-walk. However, I don’t think it fits all use cases. 
A reliable step-by-step ref-walk, like this set, works well with 
BPF, and we want to keep it.
The distinction between rcu-walk and ref-walk is an internal
implementation detail.  You as a caller shouldn't need to think about
the difference.  You just want to walk.  Note that LOOKUP_RCU is
documented in namei.h as "semi-internal".  The only uses outside of
core-VFS code is in individual filesystem's d_revalidate handler - they
are checking if they are allowed to sleep or not.  You should never
expect to pass LOOKUP_RCU to an VFS API - no other code does.

It might be reasonable for you as a caller to have some control over
whether the call can sleep or not.  LOOKUP_CACHED is a bit like that.
But for dotdot lookup the code will never sleep - so that is not
relevant.
Unfortunately, the BPF use case is more complicated. In some cases, 
the callback function cannot be call in rcu critical sections. For 
example, the callback may need to read xatter. For these cases, we
we cannot use RCU walk at all. 
I really think you should stop using the terms RCU walk and ref-walk.  I
think they might be focusing your thinking in an unhelpful direction.
Thank you! I really appreciate you helping to shape this API and it
aligns a lot with my thinking.
The key issue about reading xattrs is that it might need to sleep.
Focusing on what might need to sleep and what will never need to sleep
is a useful approach - the distinction is wide spread in the kernel and
several function take a flag indicating if they are permitted to sleep,
or if failure when sleeping would be required.

So your above observation is better described as 

   The vfs_walk_ancestors() API has an (implicit) requirement that the
   callback mustn't sleep.  This is a problem for some use-cases
   where the call back might need to sleep - e.g. for accessing xattrs.

That is a good and useful observation.  I can see three possibly
responses:

1/ Add a vfs_walk_ancestors_maysleep() API for which the callback is
   always allowed to sleep.  I don't particularly like this approach.
Agreed.
2/ Use repeated calls to vfs_walk_parent() when the handling of each
   ancestor might need to sleep.  I see no problem with supporting both
   vfs_walk_ancestors() and vfs_walk_parent().  There is plenty of
   precedent for having different  interfaces for different use cases.
Meh.
3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.
I think that's fine.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help