Thread (13 messages) 13 messages, 4 authors, 2025-06-06

Re: [RFC PATCH 1/3] landlock: walk parent dir without taking references

From: Christian Brauner <brauner@kernel.org>
Date: 2025-06-06 10:25:47
Also in: linux-fsdevel

On Wed, Jun 04, 2025 at 10:05:01PM +0100, Tingmao Wang wrote:
On 6/4/25 18:15, Mickaël Salaün wrote:
quoted
On Wed, Jun 04, 2025 at 01:45:43AM +0100, Tingmao Wang wrote:
quoted
[..]
@@ -897,10 +898,14 @@ static bool is_access_to_paths_allowed(
 			break;
 jump_up:
 		if (walker_path.dentry == walker_path.mnt->mnt_root) {
+			/* follow_up gets the parent and puts the passed in path */
+			path_get(&walker_path);
 			if (follow_up(&walker_path)) {
+				path_put(&walker_path);
path_put() cannot be safely called in a RCU read-side critical section
because it can free memory which can sleep, and also because it can wait
for a lock.  However, we can call rcu_read_unlock() before and
rcu_read_lock() after (if we hold a reference).
Thanks for pointing this out.

Actually I think this might be even more tricky.  I'm not sure if we can
always rely on the dentry still being there after rcu_read_unlock(),
regardless of whether we do a path_get() before unlocking...  Even when
we're inside a RCU read-side critical section, my understanding is that if
a dentry reaches zero refcount and is selected to be freed (either
immediately or by LRU) from another CPU, dentry_free will do
call_rcu(&dentry->d_u.d_rcu, __d_free) which will cause the dentry to
immediately be freed after our rcu_read_unlock(), regardless of whether we
had a path_get() before that.

In fact because lockref_mark_dead sets the refcount to negative,
path_get() would simply be wrong.  We could use lockref_get_not_dead()
instead, and only continue if we actually acquired a reference, but then
we have the problem of not being able to dput() the dentry acquired by
follow_up(), without risking it getting killed before we can enter RCU
again (although I do wonder if it's possible for it to be killed, given
that there is an active mountpoint on it that we hold a reference for?).

While we could probably do something like "defer the dput() until we next
reach a mountpoint and can rcu_read_unlock()", or use lockref_put_return()
and assert that the dentry must still have refcount > 0 since it's an
in-use mountpoint, after a lot of thinking it seems to me the only clean
solution is to have a mechanism of walking up mounts completely
reference-free.  Maybe what we actually need is choose_mountpoint_rcu().

That function is private, so I guess a question for Al and other VFS
people here is, can we potentially expose an equivalent publicly?
(Perhaps it would only do effectively what __prepend_path in d_path.c
does, and we can track the mount_lock seqcount outside.  Also the fact
that throughout all this we have a valid reference to the leaf dentry we
started from, to me should mean that the mount can't disappear under us
anyway)
I have a very hard time warming up to the idea that you're doing
reference less path walk out of sight of the core VFS. That smells like
a massive landmine. It's barely correct now and it won't get better if
this code bitrots when Al or I massage other codepaths. And that happens
a lot.

We're exposing new infrastructure for the BPF LSM layer that is targeted
also to help Landlock. Use the helper exposed for them. It makes things
easier for all of us.
quoted
quoted
 				/* Ignores hidden mount points. */
 				goto jump_up;
 			} else {
+				path_put(&walker_path);
 				/*
 				 * Stops at the real root.  Denies access
 				 * because not all layers have granted access.
@@ -920,11 +925,11 @@ static bool is_access_to_paths_allowed(
 			}
 			break;
 		}
-		parent_dentry = dget_parent(walker_path.dentry);
-		dput(walker_path.dentry);
+		parent_dentry = walker_path.dentry->d_parent;
 		walker_path.dentry = parent_dentry;
 	}
-	path_put(&walker_path);
+
+	rcu_read_unlock();

 	if (!allowed_parent1) {
 		log_request_parent1->type = LANDLOCK_REQUEST_FS_ACCESS;
@@ -1045,12 +1050,11 @@ static bool collect_domain_accesses(
 					       layer_masks_dom,
 					       LANDLOCK_KEY_INODE);

-	dget(dir);
-	while (true) {
-		struct dentry *parent_dentry;
+	rcu_read_lock();

+	while (true) {
 		/* Gets all layers allowing all domain accesses. */
-		if (landlock_unmask_layers(find_rule(domain, dir), access_dom,
+		if (landlock_unmask_layers(find_rule_rcu(domain, dir), access_dom,
 					   layer_masks_dom,
 					   ARRAY_SIZE(*layer_masks_dom))) {
 			/*
@@ -1065,11 +1069,11 @@ static bool collect_domain_accesses(
 		if (dir == mnt_root || WARN_ON_ONCE(IS_ROOT(dir)))
 			break;

-		parent_dentry = dget_parent(dir);
-		dput(dir);
-		dir = parent_dentry;
+		dir = dir->d_parent;
 	}
-	dput(dir);
+
+	rcu_read_unlock();
+
 	return ret;
 }

--
2.49.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help