Re: [RFC PATCH 1/3] landlock: walk parent dir without taking references
From: Christian Brauner <brauner@kernel.org>
Date: 2025-06-06 10:25:47
Also in:
linux-fsdevel
On Wed, Jun 04, 2025 at 10:05:01PM +0100, Tingmao Wang wrote:
On 6/4/25 18:15, Mickaël Salaün wrote:quoted
On Wed, Jun 04, 2025 at 01:45:43AM +0100, Tingmao Wang wrote:quoted
[..]@@ -897,10 +898,14 @@ static bool is_access_to_paths_allowed( break; jump_up: if (walker_path.dentry == walker_path.mnt->mnt_root) { + /* follow_up gets the parent and puts the passed in path */ + path_get(&walker_path); if (follow_up(&walker_path)) { + path_put(&walker_path);path_put() cannot be safely called in a RCU read-side critical section because it can free memory which can sleep, and also because it can wait for a lock. However, we can call rcu_read_unlock() before and rcu_read_lock() after (if we hold a reference).Thanks for pointing this out. Actually I think this might be even more tricky. I'm not sure if we can always rely on the dentry still being there after rcu_read_unlock(), regardless of whether we do a path_get() before unlocking... Even when we're inside a RCU read-side critical section, my understanding is that if a dentry reaches zero refcount and is selected to be freed (either immediately or by LRU) from another CPU, dentry_free will do call_rcu(&dentry->d_u.d_rcu, __d_free) which will cause the dentry to immediately be freed after our rcu_read_unlock(), regardless of whether we had a path_get() before that. In fact because lockref_mark_dead sets the refcount to negative, path_get() would simply be wrong. We could use lockref_get_not_dead() instead, and only continue if we actually acquired a reference, but then we have the problem of not being able to dput() the dentry acquired by follow_up(), without risking it getting killed before we can enter RCU again (although I do wonder if it's possible for it to be killed, given that there is an active mountpoint on it that we hold a reference for?). While we could probably do something like "defer the dput() until we next reach a mountpoint and can rcu_read_unlock()", or use lockref_put_return() and assert that the dentry must still have refcount > 0 since it's an in-use mountpoint, after a lot of thinking it seems to me the only clean solution is to have a mechanism of walking up mounts completely reference-free. Maybe what we actually need is choose_mountpoint_rcu(). That function is private, so I guess a question for Al and other VFS people here is, can we potentially expose an equivalent publicly? (Perhaps it would only do effectively what __prepend_path in d_path.c does, and we can track the mount_lock seqcount outside. Also the fact that throughout all this we have a valid reference to the leaf dentry we started from, to me should mean that the mount can't disappear under us anyway)
I have a very hard time warming up to the idea that you're doing reference less path walk out of sight of the core VFS. That smells like a massive landmine. It's barely correct now and it won't get better if this code bitrots when Al or I massage other codepaths. And that happens a lot. We're exposing new infrastructure for the BPF LSM layer that is targeted also to help Landlock. Use the helper exposed for them. It makes things easier for all of us.
quoted
quoted
/* Ignores hidden mount points. */ goto jump_up; } else { + path_put(&walker_path); /* * Stops at the real root. Denies access * because not all layers have granted access.@@ -920,11 +925,11 @@ static bool is_access_to_paths_allowed( } break; } - parent_dentry = dget_parent(walker_path.dentry); - dput(walker_path.dentry); + parent_dentry = walker_path.dentry->d_parent; walker_path.dentry = parent_dentry; } - path_put(&walker_path); + + rcu_read_unlock(); if (!allowed_parent1) { log_request_parent1->type = LANDLOCK_REQUEST_FS_ACCESS;@@ -1045,12 +1050,11 @@ static bool collect_domain_accesses( layer_masks_dom, LANDLOCK_KEY_INODE); - dget(dir); - while (true) { - struct dentry *parent_dentry; + rcu_read_lock(); + while (true) { /* Gets all layers allowing all domain accesses. */ - if (landlock_unmask_layers(find_rule(domain, dir), access_dom, + if (landlock_unmask_layers(find_rule_rcu(domain, dir), access_dom, layer_masks_dom, ARRAY_SIZE(*layer_masks_dom))) { /*@@ -1065,11 +1069,11 @@ static bool collect_domain_accesses( if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir))) break; - parent_dentry = dget_parent(dir); - dput(dir); - dir = parent_dentry; + dir = dir->d_parent; } - dput(dir); + + rcu_read_unlock(); + return ret; } --2.49.0