Re: [PATCH v5 bpf-next 0/5] bpf path iterator
From: Song Liu <hidden>
Date: 2025-06-26 05:52:55
Also in:
bpf, linux-fsdevel, lkml
On Jun 25, 2025, at 6:05 PM, NeilBrown [off-list ref] wrote:
[...]
quoted
I can't speak for Mickaël, but a callback-based interface is less flexible (and _maybe_ less performant?). Also, probably we will want to fallback to a reference-taking walk if the walk fails (rather than, say, retry infinitely), and this should probably use Song's proposed iterator. I'm not sure if Song would be keen to rewrite this iterator patch series in callback style (to be clear, it doesn't necessarily seem like a good idea to me, and I'm not asking him to), which means that we will end up with the reference walk API being a "call this function repeatedly", and the rcu walk API taking a callback. I think it is still workable (after all, if Landlock wants to reuse the code in the callback it can just call the callback function itself when doing the reference walk), but it seems a bit "ugly" to me.call-back can have a performance impact (less opportunity for compiler optimisation and CPU speculation), though less than taking spinlock and references. However Al and Christian have drawn a hard line against making seq numbers visible outside VFS code so I think it is the approach most likely to be accepted. Certainly vfs_walk_ancestors() would fallback to ref-walk if rcu-walk resulted in -ECHILD - just like all other path walking code in namei.c. This would be largely transparent to the caller - the caller would only see that the callback received a NULL path indicating a restart. It wouldn't need to know why.
I guess I misunderstood the proposal of vfs_walk_ancestors() initially, so some clarification: I think vfs_walk_ancestors() is good for the rcu-walk, and some rcu-then-ref-walk. However, I don’t think it fits all use cases. A reliable step-by-step ref-walk, like this set, works well with BPF, and we want to keep it. Can we ship this set as-is (or after fixing the comment reported by kernel test robot)? I really don’t think we need figure out all details about the rcu-walk here. Once this is landed, we can try implementing the rcu-walk (vfs_walk_ancestors or some variation). If no one volunteers, I can give it a try. Thanks, Song