Thread (25 messages) 25 messages, 9 authors, 2025-03-20

Re: Unprivileged filesystem mounts

From: Christian Brauner <brauner@kernel.org>
Date: 2025-03-11 11:01:54
Also in: linux-bcachefs, linux-fsdevel

On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote:
On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote:
quoted
People have stuff to get done.  If you disallow unprivileged filesystem
mounts, they will just use sudo (or equivalent) instead.
I am not advocating that we disallow mounting of untrusted devices.
quoted
The problem is
not that users are mounting untrusted filesystems.  The problem is that
mounting untrusted filesystems is unsafe.
quoted
Making untrusted filesystems safe to mount is the only solution that
lets users do what they actually need to do. That means either actually
fixing the filesystem code,
Yes, and the point I keep making is that we cannot provide that
guarantee from the kernel for existing filesystems. We cannot detect
all possible malicous tampering situations without cryptogrpahically
secure verification, and we can't generate full trust from nothing.

The typical desktop policy of "probe and automount any device that
is plugged in" prevents the user from examining the device to
determine if it contains what it is supposed to contain.  The user
is not given any opportunity to device if trust is warranted before
the kernel filesystem parser running in ring 0 is exposed to the
malicious image.

That's the fundamental policy problem we need to address: the user
and/or admin is not in control of their own security because
application developers and/or distro maintainers have decided they
should not have a choice.

In this situation, the choice of what to do *must* fall to the user,
but the argument for "filesystem corruption is a CVE-worthy bug" is
that the choice has been taken away from the user. That's what I'm
saying needs to change - the choice needs to be returned to the
user...
quoted
or running it in a sufficiently tight
sandbox that vulnerabilities in it are of too low importance to matter.
libguestfs+FUSE is the most obvious way to do this, but the performance
might not be enough for distros to turn it on.
Yes, I have advocated for that to be used for desktop mounts in the
past. Similarly, I have also advocated for liblinux + FUSE to be
used so that the kernel filesystem code is used but run from a
userspace context where the kernel cannot be compromised.

I have also advocated for user removable devices to be encrypted by
default. The act of the user unlocking the device automatically
marks it as trusted because undetectable malicious tampering is
highly unlikely.

I have also advocated for a device registry that records removable
device signatures and whether the user trusted them or not so that
they only need to be prompted once for any given removable device
they use.

There are *many* potential user-friendly solutions to the problem,
but they -all- lie in the domain of userspace applications and/or
policies. This is *not* a problem more or better code in the kernel
can solve.
Strongly agree.
Kees and Co keep telling us we should be making changes that make it
harder (or compeltely prevent) entire classes of vulnerabilities
from being exploited. Yet every time we suggest that a more secure
policy should be applied to automounting filesystems to prevent
system compromise on device hotplug, nobody seems to be willing to
put security first.
I agree with Dave here a lot.

The case where arbitrary devices stuck into a laptop (e.g., USB sticks)
are mounted isn't solved by making a filesystem mountable unprivileged.
The mounted device cannot show up in the global mount namespace
somewhere since the user doesn't own the initial mount+user namespace.
So it's pointless. In other words, there's filesystem level checks and
mount namespace based checks. Circumventing that restriction means that
any user can just mount the device at any location in the global mount
namespace and therefore simply overmount other stuff.

The other thing is whether or not a filesystem is allowed to be mounted
by an unprivileged user namespaces. That is not a policy decision the
kernel can make, should make, or has to make. This is a road to security
disaster.

The new mount api has built-in
delegation capabilities for exactly this reason and use-case so the
kernel doesn't have to do that. Policy like that belongs into userspace. 
The new mount api makes it possible for userspace to correctly and
safely delegate any filesystem mount to unprivileged users. It's e.g.,
heavily used by bpf to make bpffs and thus bpf usable by unprivileged
userspace and containers.

There's a generic API for this already that we presented on in [1] at
LSFMM 2023. This has proper security policies in place when and how it
is allowed even for a user not in a user namespace to mount an arbitrary
filesystem (device or no device-based).

    NAME
    systemd-mountfsd.service, systemd-mountfsd - Disk Image File System Mount Service
    
    SYNOPSIS
    systemd-mountfsd.service
    
    /usr/lib/systemd/systemd-mountfsd
    
    DESCRIPTION
    systemd-mountfsd is a system service that dissects disk images, and
    returns mount file descriptors for the file systems contained therein to
    clients, via a Varlink IPC API.
    
    The disk images provided must contain a raw file system image or must
    follow the Discoverable Partitions Specification[1]. Before mounting any
    file systems authenticity of the disk image is established in one or a
    combination of the following ways:
    
    1. If the disk image is located in a regular file in one of the
       directories /var/lib/machines/, /var/lib/portables/,
       /var/lib/extensions/, /var/lib/confexts/ or their counterparts in the
       /etc/, /run/, /usr/lib/ it is assumed to be trusted.
    
    2. If the disk image contains a Verity enabled disk image, along with a
       signature partition with a key in the kernel keyring or in
       /etc/verity.d/ (and related directories) the disk image is considered
       trusted.

    This service provides one Varlink[2] service:
    io.systemd.MountFileSystem which accepts a file descriptor to a
    regular file or block device, and returns a number of file
    descriptors referring to an fsmount() file descriptor the client may
    then attach to a path of their choice.
    
    The returned mounts are automatically allowlisted in the
    per-user-namespace allowlist maintained by
    systemd-nsresourced.service(8).

    The file systems are automatically fsck(8)'ed before mounting.

    NOTES
    1. Discoverable Partitions Specification
       https://uapi-group.org/specifications/specs/discoverable_partitions_specification/

    2. Varlink
       https://varlink.org/

This work has now also been expanded to cover plain directory trees and
will be available in the next release.

It is currently part of systemd but like with a lot of other such tools
they are available standalone for non-systemd systems and if not that
can be done.

[1]: https://youtu.be/RbMhupT3Dk4?si=pIGH5XPPUJ0m6bi0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help