Re: Unprivileged filesystem mounts

From: Demi Marie Obenour <hidden>
Date: 2025-03-20 06:26:45
Also in: linux-bcachefs, linux-fsdevel

On Wed, Mar 19, 2025 at 05:25:17PM -0400, Theodore Ts'o wrote:

On Wed, Mar 19, 2025 at 01:44:13PM -0400, Demi Marie Obenour wrote:

quoted

Note that this won't help if you have a malicious hardware that
*pretends* to be a USB storage device, but which doens't behave a like
a honest storage device.  For example, reading a particular sector
with one data at time T, and a different data at time T+X, with no
intervening writes.  There is no real defense to this attack, since
there is no way that you can authentiate the external storage device;
you could have a registry of USB vendor and model id's, but a device
can always lie about its id numbers.

This attack can be defended against by sandboxing the filesystem driver
and copying files to trusted storage before using them.  You can
authenticate devices based on what port they are plugged into, and Qubes
OS is working on exactly that.

Copying files to trusted storge is not sufficient.  The problem is
that an untrustworthy storage device can still play games with
metadata blocks.  If you are willing to copy the entire storage device
to trustworthy storage, and then run fsck on the file system, and then
mount it, then *sure* that would help.  But if the storage device is
very large or very slow, this might not be practical.

Copying flles is not sufficient on its own.  You need to _also_ sandbox
the file system driver, which defeats the attack you mentioned above:
the attacker can compromise the VM running the file system, but that
doesn't give the attacker anything particularly useful.

quoted

Like everything else, security and usability and performance and costs
are all engineering tradeoffs....

Is the tradeoff fundamental, or is it a consequence of Linux being a
monolithic kernel?  If Linux were a microkernel and every filesystem
driver ran as a userspace process with no access to anything but the
device it is accessing, then there would be no tradeoff when it comes to
filesystems: a compromised filesystem driver would have no more access
than the device itself would, so compromising a filesystem driver would
be of much less value to an attacker.  There is still the problem that
plug and play is incompatible with not trusting devices to identify
themselves, but that's a different concern.

Microkernels have historically been a performance disaster.  Yes, you
can invest a *vast* amount of effort into trying to make a microkernel
OS more performant, but in the meantime, the competing monolithic
kernel will have gotten even faster, or added more features, leaving
the microkernel in the dust.

The L4 family of microkernels, and especially seL4, show that
microkernels do not need to be slow.  I do agree that making a
microkernel-based OS fast is hard, but on the other hand, running an
entire Linux VM just to host a single application isn't exactly an
efficient use of resources either.  The latter is what systems like Kata
containers wind up doing.

The effort needed to create a new file system from scratch, taking it
all the way from the initial design, implementation, testing and
performance tuning, and making it something customers are comfortable
depending on it for enterprise workloads is between 50 and 100
engineer years.  This estimate came from looking at the development
effort needed for various file systems implemented on monolithic
kernels, including Digital's Advfs (part of Digital Unix and OSF/1),
IBM's AIX, and Sun's ZFS, as well as GPFS from IBM (although that was
a cluster file sytem, and the effort estimated from my talking to the
engineering managers and tech leads was around 200 PY's.)

I'm not sure how much harder it will be to make a performant file
system which is suitable for enterprise workloads from a performance,
feature, and stability perspective, *and* to make it secure against
storage devices which are outside the TCB, *and* to make it work on a
microkernel.  But I'm going to guess it would inflate these effort
estimates by at least 50%, if not more.

My understanding is that "Secure against storage devices which are
outside the TCB" mostly requires 2 things:

1. Either a programming language in which memory safety vulnerabilities
   are difficult to introduce by accident, or a sandbox that ensures
   that a compromised file system driver cannot do more than cause file
   system operations to return wrong results.

2. A way to kill a file system that is caught in an infinite loop, is
   eating too much memory, or is otherwise the victim of a denial of
   service attack without crashing the whole system.  This is not needed
   if denial of service attacks are outside of your threat model.

I'm not asking you (or anyone else) to write a filesystem driver that
has no bugs in the face of arbitrarily corrupted input.  I _expect_ that
there will be bugs in this case.  Right now, Linux kernel file systems
are written in C and run in the kernel, which means that a bug can
easily result in a complete system compromise.

Of course, if we're just witing a super simple file system that is
suitable for backups and file transfers, but not much else, that would
probably take much less efort.  But if we need to support file
exchange with storge devices with NTFS or HFS, thos aren't simple file
sytes.  So the VM sandbox approach might still be the better way to go.

Certainly the VM sandbox is the simplest approach in the short term.

P.S.: For all that I may disagree with you on a lot of things, I am very
grateful for all the work you have put into making ext4 as solid a
filesystem as it is, as well as for your other innovations (like
creating /dev/{u,}random).
-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help