Thread (50 messages) 50 messages, 9 authors, 2025-07-26

Re: [PATCH v5 bpf-next 0/5] bpf path iterator

From: "NeilBrown" <neil@brown.name>
Date: 2025-07-09 22:24:18
Also in: bpf, linux-fsdevel, lkml

On Thu, 10 Jul 2025, Song Liu wrote:
quoted
On Jul 9, 2025, at 9:06 AM, Mickaël Salaün [off-list ref] wrote:\
[...]
quoted
quoted
If necessary, we hide “root" inside @data. This is good. 
quoted
@path would be updated with latest ancestor path (e.g. @root).
Update @path to the last ancestor and hold proper references. 
I missed this part earlier. With this feature, vfs_walk_ancestors 
should work usable with open-codeed bpf path iterator. 

I have a question about this behavior with RCU walk. IIUC, RCU 
walk does not hold reference to @ancestor when calling walk_cb().
I think a reference to the mount should be held, but not necessarily to
the dentry if we are still in the same mount as the original path.
If we update @path and do path_put() after the walk, we have to hold 
reference to both the mnt and the dentry, no? 
quoted
quoted
If walk_cb() returns false, shall vfs_walk_ancestors() then
grab a reference on @ancestor? This feels a bit weird to me.
If walk_cb() checks for a root, it will return false when the path will
match, and the caller would expect to get this root path, right?
If the user want to walk to the global root, walk_cb() may not 
return false at all, IIUC. walk_cb() may also return false on 
other conditions. 
quoted
In general, it's safer to always have the same behavior when holding or
releasing a reference.  I think the caller should then always call
path_put() after vfs_walk_ancestors() whatever the return code is.
quoted
Maybe “updating @path to the last ancestor” should only apply to
LOOKUP_RCU==false case? 
quoted
@flags could contain LOOKUP_RCU or not, which enables us to have
walk_cb() not-RCU compatible.

When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors()
failed with -ECHILD, the caller can restart the walk by calling
vfs_walk_ancestors() again but without LOOKUP_RCU.

Given we want callers to handle -ECHILD and call vfs_walk_ancestors
again without LOOKUP_RCU, I think we should keep @path not changed
With LOOKUP_RCU==true, and only update it to the last ancestor 
when LOOKUP_RCU==false.
As Neil said, we don't want to explicitly pass LOOKUP_RCU as a public
flag.  Instead, walk_cb() should never sleep (and then potentially be
called under RCU by the vfs_walk_ancestors() implementation).
How should the user handle -ECHILD without LOOKUP_RCU flag? Say the
following code in landlocked:

/* Try RCU walk first */
err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU);

if (err == -ECHILD) {
	struct path walk_path = *path;

	/* reset any data changed by the walk */
	reset_data(data);
	
	/* now do ref-walk */
	err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0);
}

Or do you mean vfs_walk_ancestors will never return -ECHILD?
Then we need vfs_walk_ancestors to call reset_data logic, right?
It isn't clear to me that vfs_walk_ancestors() needs to return anything.
All the communication happens through walk_cb()

walk_cb() is called with a path, the data, and a "may_sleep" flag.
If it needs to sleep but may_sleep is not set, it returns "-ECHILD"
which causes the walk to restart and use refcounts.
If it wants to stop, it returns 0.
If it wants to continue, it returns 1.
If it wants a reference to the path then it can use (new)
vfs_legitimize_path() which might fail.
If it wants a reference to the path and may_sleep is true, it can use
path_get() which won't fail.

When returning -ECHILD (either because of a need to sleep or because
vfs_legitimize_path() fails), walk_cb() would reset_data().

NeilBrown
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help