Thread (80 messages) 80 messages, 8 authors, 2025-11-24

Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest

From: Kevin Brodsky <hidden>
Date: 2025-11-12 10:43:07
Also in: linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel

On 11/11/2025 17:03, Ryan Roberts wrote:
On 11/11/2025 15:56, Kevin Brodsky wrote:
quoted
On 11/11/2025 10:24, Ryan Roberts wrote:
quoted
[...]
quoted
quoted
quoted
+		state->active = true;
+		arch_enter_lazy_mmu_mode();
+	}
 }
 
 static inline void lazy_mmu_mode_disable(void)
 {
-	arch_leave_lazy_mmu_mode();
+	struct lazy_mmu_state *state = &current->lazy_mmu_state;
+
+	VM_WARN_ON_ONCE(state->nesting_level == 0);
+	VM_WARN_ON(!state->active);
+
+	if (--state->nesting_level == 0) {
+		state->active = false;
+		arch_leave_lazy_mmu_mode();
+	} else {
+		/* Exiting a nested section */
+		arch_flush_lazy_mmu_mode();
+	}
 }
 
 static inline void lazy_mmu_mode_pause(void)
 {
+	struct lazy_mmu_state *state = &current->lazy_mmu_state;
+
+	VM_WARN_ON(state->nesting_level == 0 || !state->active);
nit: do you need the first condition? I think when nesting_level==0, we expect
to be !active?
I suppose this should never happen indeed - I was just being extra
defensive.

Either way David suggested allowing pause()/resume() to be called
outside of any section so the next version will bail out on
nesting_level == 0.
Ignoring my current opinion that we don't need pause/resume at all for now; Are
you suggesting that pause/resume will be completely independent of
enable/disable? I think that would be best. So enable/disable increment and
decrement the nesting_level counter regardless of whether we are paused.
nesting_level 0 => 1 enables if not paused. nesting_level 1 => 0 disables if not
paused. pause disables nesting_level >= 1, resume enables if nesting_level >= 1.
This is something else. Currently the rules are:

[A]

// pausing forbidden
enable()
    pause()
    // pausing/enabling forbidden
    resume()
disable()

David suggested allowing:

[B]

pause()
// pausing/enabling forbidden
resume()

Your suggestion is also allowing:

[C]

pause()
    // pausing forbidden
    enable()
    disable()
resume()
I think the current kasan kasan_depopulate_vmalloc_pte() path will require [C]
if CONFIG_DEBUG_PAGEALLOC is enabled on arm64. It calls __free_page() while
paused. I guess CONFIG_DEBUG_PAGEALLOC will cause __free_page() ->
debug_pagealloc_unmap_pages() ->->-> update_range_prot() -> lazy_mmu_enable().
Well, I really should have tried booting with KASAN enabled before...
lazy_mmu_mode_enable() complains exactly as you predicted:
[    1.047587] WARNING: CPU: 0 PID: 1 at include/linux/pgtable.h:273
update_range_prot+0x2dc/0x50c
[    1.048025] Modules linked in:
[    1.048296] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.18.0-rc3-00012-ga901e7f479f1 #142 PREEMPT
[    1.048706] Hardware name: FVP Base RevC (DT)
[    1.048941] pstate: 11400009 (nzcV daif +PAN -UAO -TCO +DIT -SSBS
BTYPE=--)
[    1.049309] pc : update_range_prot+0x2dc/0x50c
[    1.049631] lr : update_range_prot+0x80/0x50c
[    1.049950] sp : ffff8000800e6f20
[    1.050162] x29: ffff8000800e6fb0 x28: ffff700010014000 x27:
ffff700010016000
[    1.050747] x26: 0000000000000000 x25: 0000000000000001 x24:
00000000008800f7
[    1.051308] x23: 0000000000000000 x22: fff00008000f7000 x21:
fff00008003009f8
[    1.051884] x20: fff00008000f8000 x19: 1ffff0001001cdea x18:
ffff800080769000
[    1.052469] x17: ffff95c63264ec00 x16: ffff8000800e7504 x15:
0000000000000003
[    1.053045] x14: ffff95c63482f000 x13: 0000000000000000 x12:
ffff783ffc0007bf
[    1.053620] x11: 1ffff83ffc0007be x10: ffff783ffc0007be x9 :
dfff800000000000
[    1.054203] x8 : fffd80010001f000 x7 : ffffffffffffffff x6 :
0000000000000001
[    1.054776] x5 : 0000000000000000 x4 : fff00008003009f9 x3 :
1ffe00010006013f
[    1.055348] x2 : fff0000800300000 x1 : 0000000000000001 x0 :
0000000000000000
[    1.055912] Call trace:
[    1.056100]  update_range_prot+0x2dc/0x50c (P)
[    1.056478]  set_memory_valid+0x44/0x70
[    1.056850]  __kernel_map_pages+0x68/0xe4
[    1.057226]  __free_frozen_pages+0x528/0x1180
[    1.057601]  ___free_pages+0x11c/0x160
[    1.057961]  __free_pages+0x14/0x20
[    1.058307]  kasan_depopulate_vmalloc_pte+0xd4/0x184
[    1.058748]  __apply_to_page_range+0x678/0xda8
[    1.059149]  apply_to_existing_page_range+0x14/0x20
[    1.059553]  kasan_release_vmalloc+0x138/0x200
[    1.059982]  purge_vmap_node+0x1b4/0x8a0
[    1.060371]  __purge_vmap_area_lazy+0x4f8/0x870
[    1.060779]  _vm_unmap_aliases+0x488/0x6ec
[    1.061176]  vm_unmap_aliases+0x1c/0x34
[    1.061567]  change_memory_common+0x17c/0x380
[    1.061949]  set_memory_ro+0x18/0x24
[...]
Arguably you could move the resume() to before the __free_page(). But it just
illustrates that it's all a bit brittle at the moment...
Difficult to disagree. With things like DEBUG_PAGEALLOC it becomes very
hard to know what is guaranteed not to use lazy MMU.
quoted
quoted
Perhaps we also need nested pause/resume? Then you just end up with 2 counters;
enable_count and pause_count. Sorry if this has already been discussed.
And finally:

[D]

pause()
    pause()
        enable()
        disable()
    resume()
resume()

I don't really mind either way, but I don't see an immediate use for [C]
and [D] - the idea is that the paused section is short and controlled,
not made up of arbitrary calls. 
If my thinking above is correct, then I've already demonstrated that this is not
the case. So I'd be inclined to go with [D] on the basis that it is the most robust.

Keeping 2 nesting counts (enable and pause) feels pretty elegant to me and gives
the fewest opportunities for surprises.
Agreed, if we're going to allow enable() within a paused section, then
we might as well allow paused sections to nest too. The use-case is
clear, so I'm happy to go ahead and make those changes.

David, any thoughts?

- Kevin
Thanks,
Ryan
quoted
A potential downside of allowing [C] and
[D] is that it makes it harder to detect unintended nesting (fewer
VM_WARN assertions). Happy to implement it if this proves useful though.

OTOH the idea behind [B] is that it allows the caller of
pause()/resume() not to care about whether lazy MMU is actually enabled
or not - i.e. the kasan helpers would keep working even if
apply_to_page_range() didn't use lazy MMU any more.
quoted
quoted
quoted
quoted
+
+	state->active = false;
 	arch_leave_lazy_mmu_mode();
 }
 
[...]
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help