Re: Apparmor move_mount mediation breaks mount tool in containers
From: John Johansen <john.johansen@canonical.com>
Date: 2023-12-05 18:34:50
Also in:
regressions
On 12/5/23 09:08, Christian Brauner wrote:
On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:quoted
[CCing Linus] Hi, top-posting for once, to make this easily accessible to everyone. Stéphane, many thx for your insights. I'm CCing Linus on this, as I guess he wants to be aware of this. Normally I try to sum up things somewhat when doing so, but this here likely won't end well. So all I'm saying is: commit 157a3537d6bc28 ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also included in v6.6.3 broke containers in some setups. John described that commit with the words "it is a far from good solution. It is a stop gap." and later mentioned that "a CVE is coming for this and that having this unmediated in the tree isn't good either." For more details please check out the thread that on lore started with this message: https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/ (local) Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. On 05.12.23 07:57, Stéphane Graber wrote:quoted
On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten Leemhuis) [off-list ref] wrote:quoted
On 04.12.23 14:14, John Johansen wrote:quoted
On 12/3/23 17:34, Stéphane Graber wrote:Side note: Stéphane, thx for CCing the regressions list.quoted
quoted
On Sun, Dec 3, 2023 at 8:21 PM John Johansen [off-list ref] wrote:quoted
On 12/2/23 17:20, Stéphane Graber wrote:quoted
Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is blocking new mounts for all LXC, LXD and Incus users (at least) on distributions using the newer version of util-linux. That's because for a simple mount like "mount -t tmpfs tmpfs /tmp", the new mount command now performs:fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0 fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0 fsmount(3, FSMOUNT_CLOEXEC, 0) = 4 statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID, {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID, stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777, stx_size=40, ...}) = 0 move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0That last call to "move_mount" is incorrectly interpreted by AppArmor as an attempt to move-mount "/" to "/mnt" rather than as a new mount being created, this therefore results in:Dec 03 01:05:03 kernel-test kernel: audit: type=1400 audit(1701565503.599:34): apparmor="DENIED" operation="mount" class="mount" info="failed perms check" error=-13 profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount" srcname="/" flags="rw, move"move_mount() doesn't create a new mount it just moves an already existing mount around in the tree. Either that mount is moved (fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it was a detached mount to begin with.
yes, all to aware of this
The new mount api splits mounting across multiple system calls. And the state it keeps for itself is gone once you create a mount via fsmount().
yep
Because then the fs_context will be destroyed and you're only dealing with a superblock and mount(s) for it.
yep
You shouldn't care about move_mount() moving a detached mount because really it is not a move from some other filesystem location. The creation of that mount will have already happened in open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and move_mount() is the wrong place to do this.
yes and no. What we are specifically dong is mediating existing attached mounts as always and blocking move_mount from attaching the detached mount via move_mount. Which is very much something move_mount mediation should be able to do. It works as a stop gap for the new mount api bypassing the LSM at a very course level. This is why it requires a very general mount rule for apparmor to allow move_mount of the detached mount. A conditional that improves control around this is coming, but we can't make move_mount() mediation provide the full set of apparmor old mount mediation. Providing something close to what we were doing before is going to require new hooks, and likely specialized states tailored to the new mount api. Until that does happen it does mean a reduction in what apparmor policy can mediate. Today instead of being able to specify details about the mount you need a generic mount rule that just allows the application to do pretty much anything with mounts, soon you will be able to have an addition conditional that allows better control of move mount, and better logging of what is going on.
If the only thing that would be broken here is that you want to block moving mounts from a given source location then you need to pass down that this is an actual move to the LSM hook. Because I'm really not keen on giving LSMs access to is_anon_ns() or similar helpers. They're just too easy to misuse.
indeed the LSM hook here needs to have some more info passed in.
quoted
quoted
Overall, this still feels very rushed and broken to me. I understand that being able to trivially bypass AppArmor mount rules isn't great, but shipping code that makes the vast majority of said rules useless doesn't really feel like such an improvement. It's effectively turning what was a reasonably flexible policy engine around mount calls, into a binary switch to either allow everything or block everything.For a 1:1 mapping of AppArmor LSM rules from the old to the new mount api I expect that one will have to effectively redo everything internally and keep state across multiple system calls. IMO, this is really not the way to go. Instead it's probably wise to have a new mount mediation policy for the new mount api with different semantics. Because I have my doubts that a 1:1 mapping will even work and won't just cause you CVEs.
yes the mount policy will have to be extended to support the new api. Getting that right will take time.