Re: [PATCH v3 2/4] landlock: Fix handling of disconnected directories

From: Mickaël Salaün <mic@digikod.net>
Date: 2025-07-24 17:10:23
Also in: linux-fsdevel

On Thu, Jul 24, 2025 at 04:49:24PM +0200, Günther Noack wrote:

On Wed, Jul 23, 2025 at 11:01:42PM +0200, Mickaël Salaün wrote:

quoted

On Tue, Jul 22, 2025 at 07:04:02PM +0100, Tingmao Wang wrote:

quoted

On the other hand, I'm still a bit uncertain about the domain check
semantics.  While it would not cause a rename to be allowed if it is
otherwise not allowed by any rules on or above the mountpoint, this gets a
bit weird if we have a situation where renames are allowed on the
mountpoint or everywhere, but not read/writes, however read/writes are
allowed directly on a file, but the dir containing that file gets
disconnected so the sandboxed application can't read or write to it.
(Maybe someone would set up such a policy where renames are allowed,
expecting Landlock to always prevent renames where additional permissions
would be exposed?)

In the above situation, if the file is then moved to a connected
directory, it will become readable/writable again.

We can generalize this issue to not only the end file but any component
of the path: disconnected directories.  In fact, the main issue is the
potential inconsistency of access checks over time (e.g. between two
renames).  This could be exploited to bypass the security checks done
for FS_REFER.

I see two solutions:

1. *Always* walk down to the IS_ROOT directory, and then jump to the
   mount point.  This makes it possible to have consistent access checks
   for renames and open/use.  The first downside is that that would
   change the current behavior for bind mounts that could get more
   access rights (if the policy explicitly sets rights for the hidden
   directories).  The second downside is that we'll do more walk.

2. Return -EACCES (or -ENOENT) for actions involving disconnected
   directories, or renames of disconnected opened files.  This second
   solution is simpler and safer but completely disables the use of
   disconnected directories and the rename of disconnected files for
   sandboxed processes.

It would be much better to be able to handle opened directories as
(object) capabilities, but that is not currently possible because of the
way paths are handled by the VFS and LSM hooks.

Tingmao, Günther, Jann, what do you think?

I have to admit that so far, I still failed to wrap my head around the
full patch set and its possible corner cases.  I hope I did not
misunderstand things all too badly below:

As far as I understand the proposed patch, we are "checkpointing" the
intermediate results of the path walk at every mount point boundary,
and in the case where we run into a disconnected directory in one of
the nested mount points, we restore from the intermediate result at
the previous mount point directory and skip to the next mount point.

Correct

Visually speaking, if the layout is this (where ":" denotes a
mountpoint boundary between the mountpoints MP1, MP2, MP3):

                          dirfd
                            |
          :                 V         :
	  :       ham <--- spam <--- eggs <--- x.txt
	  :    (disconn.)             :
          :                           :
  / <--- foo <--- bar <--- baz        :
          :                           :
    MP1                 MP2                  MP3

When a process holds a reference to the "spam" directory, which is now
disconnected, and invokes openat(dirfd, "eggs/x.txt", ...), then we
would:

  * traverse x.txt
  * traverse eggs (checkpointing the intermediate result) <-.
  * traverse spam                                           |
  * traverse ham                                            |
  * discover that ham is disconnected:                      |
     * restore the intermediate result from "eggs" ---------'
     * continue the walk at foo
  * end up at the root

So effectively, since the results from "spam" and "ham" are discarded,
we would traverse only the inodes in the outer and inner mountpoints
MP1 and MP3, but effectively return a result that looks like we did
not traverse MP2?

We'd still check MP2's inode, but otherwise yes.

Maybe (likely) I misread the code. :) It's not clear to me what the
thinking behind this is.  Also, if there was another directory in
between "spam" and "eggs" in MP2, wouldn't we be missing the access
rights attached to this directory?

Yes, we would ignore this access right because we don't know that the
path was resolved from spam.


Regarding the capability approach:

I agree that a "capability" approach would be the better solution, but
it seems infeasible with the existing LSM hooks at the moment.  I
would be in favor of it though.

Yes, it would be a new feature with potential important changes.

In the meantime, we still need a fix for disconnected directories, and
this fix needs to be backported.  That's why the capability approach is
not part of the two solutions. ;)

To spell it out a bit more explicitly what that would mean in my mind:

When a path is looked up relative to a dirfd, the path walk upwards
would terminate at the dirfd and use previously calculated access
rights stored in the associated struct file.  These access rights
would be determined at the time of opening the dirfd, similar to how we
are already storing the "truncate" access right today for regular
files.

(Remark: There might still be corner cases where we have to solve it
the hard way, if someone uses ".." together with a dirfd-relative
lookup.)

Yep, real capabilities don't have ".." in their design.  On Linux (and
Landlock), we need to properly handle "..", which is challenging.

I also looked at what it would take to change the LSM hooks to pass
the directory that the lookup was done relative to, but it seems that
this would have to be passed through a bunch of VFS callbacks as well,
which seems like a larger change.  I would be curious whether that
would be deemed an acceptable change.

—Günther


P.S. Related to relative directory lookups, there is some movement in
the BSDs as well to use dirfds as capabilities, by adding a flag to
open directories that enforces O_BENEATH on subsequent opens:

 * https://undeadly.org/cgi?action=article;sid=20250529080623
 * https://reviews.freebsd.org/D50371

(both found via https://news.ycombinator.com/item?id=44575361)

If a dirfd had such a flag, that would get rid of the corner case
above.

This would be nice but it would not solve the current issue because we
cannot force all processes to use this flag (which breaks some use
cases).

FYI, Capsicum is a more complete implementation:
https://man.freebsd.org/cgi/man.cgi?query=capsicum&sektion=4
See the vfs.lookup_cap_dotdot sysctl too.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help