Re: [RFC 1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on... | linux-security-module

Re: [RFC 1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on resume [efivarfs]]

From: Ard Biesheuvel <ardb@kernel.org>
Date: 2025-03-16 19:20:01
Also in: linux-efi, linux-fsdevel

On Sun, 16 Mar 2025 at 15:26, James Bottomley
[off-list ref] wrote:

On Sun, 2025-03-16 at 07:46 +0100, Christian Brauner wrote:

quoted

On Sat, Mar 15, 2025 at 02:41:43PM -0400, James Bottomley wrote:

[...]

quoted

However, there's another problem: the mntput after kernel_file_open
may synchronously call cleanup_mnt() (and thus deactivate_super())
if the open fails because it's marked MNT_INTERNAL, which is caused
by SB_KERNMOUNT.  I fixed this just by not passing the SB_KERNMOUNT
flag, which feels a bit hacky.

It actually isn't. We know that vfs_kern_mount() will always
resurface the single superblock that's exposed to userspace because
we've just taken a reference to it earlier in efivarfs_pm_notify().
So that SB_KERNMOUNT flag is ignored because no new superblock is
allocated. It would only matter if we'd end up allocating a new
superblock which we never do.

I agree with the above: fc->sb_flags never propagates to the existing
superblock.  However, nothing propagates the superblock flags back to
fc->sb_flags either.  The check in vfs_create_mount() is on fc-

quoted

sb_flags.  Since the code is a bit hard to follow I added a printk on

the path.mnt flags and sure enough it comes back with MNT_INTERNAL when
SB_KERNMOUNT is set.

quoted

And if we did it would be a bug because the superblock we allocate
could be reused at any time if a userspace task mounts efivarfs
before efivarfs_pm_notify() has destroyed it (or the respective
workqueue). But that superblock would then have SB_KERNMOUNT for
something that's not supposed to be one.

True, but the flags don't propagate to the superblock, so no bug.

quoted

And whether or not that helper mount has MNT_INTERNAL is immaterial
to what you're doing here afaict.

I think the problem is the call chain mntput() -> mntput_no_expire()
which directly calls cleanup_mnt() -> deactivate_super() if that flag
is set.  Though I don't see that kernel_file_open() could ever fail
except for some catastrophic reason like out of memory, so perhaps it
isn't worth quibbling about.

quoted

So not passing the SB_KERNMOUNT flag is the right thing (see devtmpfs
as well). You could slap a comment in here explaining that we never
allocate a new superblock so it's clear to people not familiar with
this particular code.

OK, so you agree that the code as written looks correct? Even if we
don't necessarily quite agree on why.

Thanks for making progress on this. It would be nice if we could fix
this before the v6.14 release, given that the code in question was
introduced this cycle.

And there's another suggestion from Al, to use inode_lock_nested() to
work around the lockdep warning. I can take care of that one, unless
you prefer to do it yourself?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help