Re: [RFC 1/1] fix NULL mnt [was Re: apparmor NULL pointer dereference on resume [efivarfs]]
From: Ard Biesheuvel <ardb@kernel.org>
Date: 2025-03-16 19:20:01
Also in:
linux-efi, linux-fsdevel
On Sun, 16 Mar 2025 at 15:26, James Bottomley [off-list ref] wrote:
On Sun, 2025-03-16 at 07:46 +0100, Christian Brauner wrote:quoted
On Sat, Mar 15, 2025 at 02:41:43PM -0400, James Bottomley wrote:[...]quoted
quoted
However, there's another problem: the mntput after kernel_file_open may synchronously call cleanup_mnt() (and thus deactivate_super()) if the open fails because it's marked MNT_INTERNAL, which is caused by SB_KERNMOUNT. I fixed this just by not passing the SB_KERNMOUNT flag, which feels a bit hacky.It actually isn't. We know that vfs_kern_mount() will always resurface the single superblock that's exposed to userspace because we've just taken a reference to it earlier in efivarfs_pm_notify(). So that SB_KERNMOUNT flag is ignored because no new superblock is allocated. It would only matter if we'd end up allocating a new superblock which we never do.I agree with the above: fc->sb_flags never propagates to the existing superblock. However, nothing propagates the superblock flags back to fc->sb_flags either. The check in vfs_create_mount() is on fc-quoted
sb_flags. Since the code is a bit hard to follow I added a printk onthe path.mnt flags and sure enough it comes back with MNT_INTERNAL when SB_KERNMOUNT is set.quoted
And if we did it would be a bug because the superblock we allocate could be reused at any time if a userspace task mounts efivarfs before efivarfs_pm_notify() has destroyed it (or the respective workqueue). But that superblock would then have SB_KERNMOUNT for something that's not supposed to be one.True, but the flags don't propagate to the superblock, so no bug.quoted
And whether or not that helper mount has MNT_INTERNAL is immaterial to what you're doing here afaict.I think the problem is the call chain mntput() -> mntput_no_expire() which directly calls cleanup_mnt() -> deactivate_super() if that flag is set. Though I don't see that kernel_file_open() could ever fail except for some catastrophic reason like out of memory, so perhaps it isn't worth quibbling about.quoted
So not passing the SB_KERNMOUNT flag is the right thing (see devtmpfs as well). You could slap a comment in here explaining that we never allocate a new superblock so it's clear to people not familiar with this particular code.OK, so you agree that the code as written looks correct? Even if we don't necessarily quite agree on why.
Thanks for making progress on this. It would be nice if we could fix this before the v6.14 release, given that the code in question was introduced this cycle. And there's another suggestion from Al, to use inode_lock_nested() to work around the lockdep warning. I can take care of that one, unless you prefer to do it yourself?