Thread (50 messages) 50 messages, 8 authors, 2024-03-12

Re: [RFC PATCH] fs: Add vfs_masks_device_ioctl*() helpers

From: Mickaël Salaün <mic@digikod.net>
Date: 2024-03-12 10:58:49
Also in: linux-fsdevel

On Tue, Mar 12, 2024 at 09:12:28AM +1100, Dave Chinner wrote:
On Mon, Mar 11, 2024 at 10:01:33AM +0100, Günther Noack wrote:
quoted
On Mon, Mar 11, 2024 at 12:03:13PM +1100, Dave Chinner wrote:
quoted
On Fri, Mar 08, 2024 at 12:03:01PM +0100, Günther Noack wrote:
quoted
On Fri, Mar 08, 2024 at 08:02:13AM +0100, Arnd Bergmann wrote:
quoted
On Fri, Mar 8, 2024, at 00:09, Dave Chinner wrote:
quoted
I have no idea what a "safe" ioctl means here. Subsystems already
restrict ioctls that can do damage if misused to CAP_SYS_ADMIN, so
"safe" clearly means something different here.
That was my problem with the first version as well, but I think
drawing the line between "implemented in fs/ioctl.c" and
"implemented in a random device driver fops->unlock_ioctl()"
seems like a more helpful definition.
Yes, sorry for the confusion - that is exactly what I meant to say with "safe".:

Those are the IOCTL commands implemented in fs/ioctl.c which do not go through
f_ops->unlocked_ioctl (or the compat equivalent).
Which means all the ioctls we wrequire for to manage filesystems are
going to be considered "unsafe" and barred, yes?

That means you'll break basic commands like 'xfs_info' that tell you
the configuration of the filesystem. It will prevent things like
online growing and shrinking, online defrag, fstrim, online
scrubbing and repair, etc will not worki anymore. It will break
backup utilities like xfsdump, and break -all- the device management
of btrfs and bcachefs filesystems.

Further, all the setup and management of -VFS functionality- like
fsverity and fscrypt is actually done at the filesystem level (i.e
through ->unlocked_ioctl, no do_vfs_ioctl()) so those are all going
to get broken as well despite them being "vfs features".

Hence from a filesystem perspective, this is a fundamentally
unworkable definition of "safe".
As discussed further up in this thread[1], we want to only apply the IOCTL
command filtering to block and character devices.  I think this should resolve
your concerns about file system specific IOCTLs?  This is implemented in patch
V10 going forward[2].
I think you misunderstand. I used filesystem ioctls as an obvious
counter argument to this "VFS-only ioctls are safe" proposal to show
that it fundamentally breaks core filesystem boot and management
interfaces. Operations to prepare filesystems for mount may require
block device ioctls to be run. i.e. block device ioctls are required
core boot and management interfaces.

Disallowing ioctls on block devices will break udev rules that set
up block devices on kernel device instantiation events. It will
break partitioning tools that need to read/modify/rescan the
partition table. This will prevent discard, block zeroing and
*secure erase* operations. It may prevent libblkid from reporting
optimal device IO parameters to filesystem utilities like mkfs. You
won't be able to mark block devices as read only.  Management of
zoned block devices will be impossible.

Then stuff like DM and MD devices (e.g. LVM, RAID, etc) simply won't
appear on the system because they can't be scanned, configured,
assembled, etc.

And so on.

The fundamental fact is that system critical block device ioctls are
implemented by generic infrastructure below the VFS layer. They have
their own generic ioctl layer - blkdev_ioctl() is equivalent of
do_vfs_ioctl() for the block layer.  But if we cut off everything
below ->unlocked_ioctl() at the VFS, then we simply can't run any
of these generic block device ioctls.

As I said: this proposal is fundamentally unworkable without
extensive white- and black-listing of individual ioctls in the
security policies. That's not really a viable situation, because
we're going to change code and hence likely silently break those
security policy lists regularly....
Landlock is an optional sandboxing mechanism targeting unprivileged
users/processes (even if it can of course be used by privileged ones).
This means that there is no global security policy for the whole system
(unlike SELinux, AppArmor...).  System administrators that need to
manage a file system or any block devices would just not sandbox
themselves.  Moreover, most block devices should only be accessible to
the root user (which makes root the only one able to send IOCTL commands
to these block devices).  In a nutshell, processes using boot and
management interfaces are already privileged and they don't use
Landlock.  For instance, a landlocked process cannot do any mount
action, which is documented and it makes sense for the sandboxing use
case (to avoid sandbox bypass).

However, it would be interesting to know if unprivileged users can
request legitimate IOCTL commands on block devices (on a generic
distro), and if this is required for a common file system use (i.e.
excluding administration tasks).  I think all required IOCTL for common
file system use are available through the file system, not block devices,
but please correct me if I'm wrong.  What is nice with this
LANDLOCK_ACCESS_FS_IOCTL_DEV approach is that user space can identify
(with path and dev major/minor) on which device IOCTLs should be
allowed.  This is simple to understand and the information to identify
such devices is already well known.  We can also allow IOCTLs on a set
of devices, e.g. /dev/snd/.

The goal of this patch series is to enable applications to sandbox
themselves and avoid an attacker (exploiting a bug in this application)
to send arbitrary IOCTL commands to any devices available to the user
running this application.  For this sandboxing use case, I think it
wouldn't be useful to differentiate between blkdev_ioctl()'s commands
and device-specific commands because we want to either allow all IOCTL
on a block device or deny most of them (not those handled by
do_vfs_ioctl(), e.g. FIOCLEX, but that's a detail because of the file
access rights).  This is a trade off to ease sandboxing while being able
to limit access to unneeded features (which could potentially be used to
bypass the sandbox, e.g. TTY's IOCTLs).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help