Re: Unprivileged filesystem mounts

From: Dave Chinner <david@fromorbit.com>
Date: 2025-03-18 05:21:52
Also in: linux-bcachefs, linux-fsdevel

On Tue, Mar 11, 2025 at 04:10:42PM -0400, Demi Marie Obenour wrote:

On Tue, Mar 11, 2025 at 04:57:54PM +1100, Dave Chinner wrote:

quoted

On Mon, Mar 10, 2025 at 10:19:57PM -0400, Demi Marie Obenour wrote:

quoted

People have stuff to get done.  If you disallow unprivileged filesystem
mounts, they will just use sudo (or equivalent) instead.

I am not advocating that we disallow mounting of untrusted devices.

quoted

The problem is
not that users are mounting untrusted filesystems.  The problem is that
mounting untrusted filesystems is unsafe.

quoted

Making untrusted filesystems safe to mount is the only solution that
lets users do what they actually need to do. That means either actually
fixing the filesystem code,

Yes, and the point I keep making is that we cannot provide that
guarantee from the kernel for existing filesystems. We cannot detect
all possible malicous tampering situations without cryptogrpahically
secure verification, and we can't generate full trust from nothing.

Why is it not possible to provide that guarantee?  I'm not concerned
about infinite loops or deadlocks.  Is there a reason it is not possible
to prevent memory corruption?

You're asking me to prove that the on-disk filesystem format parsing
implementation is 100% provably correct. Not only that, you're
wanting me to say that journal replay copying incomplete,
unverifiable structure fragments over the top of existing disk
structures is 100% provably correct.

I am the person whole architected the existing metadata validation
infrastructure that XFS uses, and so I know it's limitations in
intimate detail. It is, by far, the closest thing we have to
complete runtime metadata validation in any Linux filesystem
(except maybe bcachefs), but it is nowhere near able to detect and
prevent 100% of potential structure corruptions.

It is *far from trivial* to validate all the weird corner cases that
exist in the on-disk format that have evolved over the last 3
decades. For the first 15 years of development, almost zero thought
was given to runtime validation of the on-disk format. People even
fought against introducing it at all. And despite this, we still
have to support the on-disk functionality those old, difficult to
validate, persistent structures describe.

[ And then there's some other random memory corruption bug in the
code, and all bets are off... ]

IOWs, no filesystem developer is ever going to give you a guarantee
that a filesystem implementation is free from memory corruption bugs
unless they've designed and implemented from the ground up to be
100% safe from such issues. No such filesystem exists in the kernel,
and it will probably be years away before anything may exist to fill
that gap.

quoted

The typical desktop policy of "probe and automount any device that
is plugged in" prevents the user from examining the device to
determine if it contains what it is supposed to contain.  The user
is not given any opportunity to device if trust is warranted before
the kernel filesystem parser running in ring 0 is exposed to the
malicious image.

That's the fundamental policy problem we need to address: the user
and/or admin is not in control of their own security because
application developers and/or distro maintainers have decided they
should not have a choice.

In this situation, the choice of what to do *must* fall to the user,
but the argument for "filesystem corruption is a CVE-worthy bug" is
that the choice has been taken away from the user. That's what I'm
saying needs to change - the choice needs to be returned to the
user...

I am 100% in favor of not automounting filesystems without user
interaction, but that only means that an exploit will require user
interaction.  Users need to get things done, and if their task requires
them to a not-fully-trusted filesystem image, then that is what they
will do, and they will typically do it in the most obvious way possible.
That most obvious way needs to be a safe way, and it needs to have good
enough performance that users don't go around looking for an unsafe way.

Well, yes, that is obvious, and not a point of contention at all,
as is evidenced by the list of solutions to this problem I outlined.

quoted

or running it in a sufficiently tight
sandbox that vulnerabilities in it are of too low importance to matter.
libguestfs+FUSE is the most obvious way to do this, but the performance
might not be enough for distros to turn it on.

Yes, I have advocated for that to be used for desktop mounts in the
past. Similarly, I have also advocated for liblinux + FUSE to be
used so that the kernel filesystem code is used but run from a
userspace context where the kernel cannot be compromised.

I have also advocated for user removable devices to be encrypted by
default. The act of the user unlocking the device automatically
marks it as trusted because undetectable malicious tampering is
highly unlikely.

That is definitely a good idea.

quoted

I have also advocated for a device registry that records removable
device signatures and whether the user trusted them or not so that
they only need to be prompted once for any given removable device
they use.

There are *many* potential user-friendly solutions to the problem,
but they -all- lie in the domain of userspace applications and/or
policies. This is *not* a problem more or better code in the kernel
can solve.

It is certainly possible to make a memory safe implementation of amy
filesystem.

Spoken like a True Expert.

If the current implementation can't prevent memory
corruption if a malicious filesystem is mounted, that is a
characteristic of the implementation.

Ah, now I see what you are trying to do. You're building a strawman
around memory corruption that you can use the argument "we need to
reimplement everything in Rust" to knock down.

Sorry, not playing that game.

However, the root filesystem is not the only filesystem image that must
be mounted.  There is also a writable data volume, and that _cannot_ be
signed because it contains user data.  It is encrypted, but part of the
threat model for both Android and ChromeOS is an attacker who has gained
root or even kernel code execution and wants to retain their access
across device reboots. They can't tamper with the kernel or root
filesystem, and privileged userspace treats the data on the writable
filesystem as untrusted.  However, the attacker can replace the writable
filesystem image with anything they want,

And therein lies the attack a fielsystem implementation can't defend
against: the attacker can rewrite the unencrypted block device to
contain anything they want, and that will then pass verification on
the next boot. Perhaps that's the class of storage attack you should
seek to prevent, not try to slap bandaids over trust model
violations or insinuate the only solution is to rewrite complex
subsystems in Rust....

-Dave.

-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help