Re: do_change_type(): refuse to operate on unmounted/not ours mounts

From: Pavel Tikhomirov <hidden>
Date: 2025-07-31 08:12:08
Also in: linux-fsdevel, lkml, stable

On Thu, Jul 31, 2025 at 3:53 PM Christian Brauner [off-list ref] wrote:

On Thu, Jul 31, 2025 at 10:40:40AM +0800, Pavel Tikhomirov wrote:

quoted

If detached mounts are our only concern, it looks like the check instead of:

if (!check_mnt(mnt)) {
        err = -EINVAL;
        goto out_unlock;
}

could've been a more relaxed one:

if (mnt_detached(mnt)) {
        err = -EINVAL;
        goto out_unlock;
}

bool mnt_detached(struct mount *mnt)
{
        return !mnt->mnt_ns;
}

not to allow propagation change only on detached mounts. (As
umount_tree sets mnt_ns to NULL.)

Changing propagation settings on detached mounts is fine and shoud work?
Changing propagation settings on unmounted mounts not so much...

Sorry, it's my confused terminology, here by "detached" mounts I mean
mounts which were unmounted with umount2(MNT_DETACH), maybe I should
call them "unmounted" (e.g. s/mnt_detached/mnt_unmounted/).

And yes, I understand why we need to allow changing propagation on
mounts in anonymous namespace without being inside it, because one
can't just enter anonymous namespace.

I don't think that we need to change anything, just sharing my
thoughts that it could be more relaxed and will still protect us from
propagation setting on unmounted mounts.

quoted

Also in do_mount_setattr we have a more relaxed check too:

if ((mnt_has_parent(mnt) || !is_anon_ns(mnt->mnt_ns)) && !check_mnt(mnt))
        goto out;

Best Regards, Tikhomirov Pavel.

On Sun, Jul 27, 2025 at 5:01 AM Andrei Vagin [off-list ref] wrote:

quoted

On Sat, Jul 26, 2025 at 10:53 AM Al Viro [off-list ref] wrote:

quoted

On Sat, Jul 26, 2025 at 10:12:34AM -0700, Andrei Vagin wrote:

quoted

On Thu, Jul 24, 2025 at 4:00 PM Al Viro [off-list ref] wrote:

quoted

On Thu, Jul 24, 2025 at 01:02:48PM -0700, Andrei Vagin wrote:

quoted

Hi Al and Christian,

The commit 12f147ddd6de ("do_change_type(): refuse to operate on
unmounted/not ours mounts") introduced an ABI backward compatibility
break. CRIU depends on the previous behavior, and users are now
reporting criu restore failures following the kernel update. This change
has been propagated to stable kernels. Is this check strictly required?

Yes.

quoted

Would it be possible to check only if the current process has
CAP_SYS_ADMIN within the mount user namespace?

Not enough, both in terms of permissions *and* in terms of "thou
shalt not bugger the kernel data structures - nobody's priveleged
enough for that".

Al,

I am still thinking in terms of "Thou shalt not break userspace"...

Seriously though, this original behavior has been in the kernel for 20
years, and it hasn't triggered any corruptions in all that time.

For a very mild example of fun to be had there:
        mount("none", "/mnt", "tmpfs", 0, "");
        chdir("/mnt");
        umount2(".", MNT_DETACH);
        mount(NULL, ".", NULL, MS_SHARED, NULL);
Repeat in a loop, watch mount group id leak.  That's a trivial example
of violating the assertion ("a mount that had been through umount_tree()
is out of propagation graph and related data structures for good").

I wasn't referring to detached mounts. CRIU modifies mounts from
non-current namespaces.

quoted

As for the "CAP_SYS_ADMIN within the mount user namespace" - which
userns do you have in mind?

The user namespace of the target mount:
ns_capable(mnt->mnt_ns->user_ns, CAP_SYS_ADMIN)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help