Re: [PATCH v4 07/12] mm: enable lazy_mmu sections to nest
From: Kevin Brodsky <hidden>
Date: 2025-11-12 10:43:07
Also in:
linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel
On 11/11/2025 17:03, Ryan Roberts wrote:
On 11/11/2025 15:56, Kevin Brodsky wrote:quoted
On 11/11/2025 10:24, Ryan Roberts wrote:quoted
[...]quoted
quoted
quoted
+ state->active = true; + arch_enter_lazy_mmu_mode(); + } } static inline void lazy_mmu_mode_disable(void) { - arch_leave_lazy_mmu_mode(); + struct lazy_mmu_state *state = ¤t->lazy_mmu_state; + + VM_WARN_ON_ONCE(state->nesting_level == 0); + VM_WARN_ON(!state->active); + + if (--state->nesting_level == 0) { + state->active = false; + arch_leave_lazy_mmu_mode(); + } else { + /* Exiting a nested section */ + arch_flush_lazy_mmu_mode(); + } } static inline void lazy_mmu_mode_pause(void) { + struct lazy_mmu_state *state = ¤t->lazy_mmu_state; + + VM_WARN_ON(state->nesting_level == 0 || !state->active);nit: do you need the first condition? I think when nesting_level==0, we expect to be !active?I suppose this should never happen indeed - I was just being extra defensive. Either way David suggested allowing pause()/resume() to be called outside of any section so the next version will bail out on nesting_level == 0.Ignoring my current opinion that we don't need pause/resume at all for now; Are you suggesting that pause/resume will be completely independent of enable/disable? I think that would be best. So enable/disable increment and decrement the nesting_level counter regardless of whether we are paused. nesting_level 0 => 1 enables if not paused. nesting_level 1 => 0 disables if not paused. pause disables nesting_level >= 1, resume enables if nesting_level >= 1.This is something else. Currently the rules are: [A] // pausing forbidden enable() pause() // pausing/enabling forbidden resume() disable() David suggested allowing: [B] pause() // pausing/enabling forbidden resume() Your suggestion is also allowing: [C] pause() // pausing forbidden enable() disable() resume()I think the current kasan kasan_depopulate_vmalloc_pte() path will require [C] if CONFIG_DEBUG_PAGEALLOC is enabled on arm64. It calls __free_page() while paused. I guess CONFIG_DEBUG_PAGEALLOC will cause __free_page() -> debug_pagealloc_unmap_pages() ->->-> update_range_prot() -> lazy_mmu_enable().
Well, I really should have tried booting with KASAN enabled before... lazy_mmu_mode_enable() complains exactly as you predicted:
[ 1.047587] WARNING: CPU: 0 PID: 1 at include/linux/pgtable.h:273 update_range_prot+0x2dc/0x50c [ 1.048025] Modules linked in: [ 1.048296] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-rc3-00012-ga901e7f479f1 #142 PREEMPT [ 1.048706] Hardware name: FVP Base RevC (DT) [ 1.048941] pstate: 11400009 (nzcV daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 1.049309] pc : update_range_prot+0x2dc/0x50c [ 1.049631] lr : update_range_prot+0x80/0x50c [ 1.049950] sp : ffff8000800e6f20 [ 1.050162] x29: ffff8000800e6fb0 x28: ffff700010014000 x27: ffff700010016000 [ 1.050747] x26: 0000000000000000 x25: 0000000000000001 x24: 00000000008800f7 [ 1.051308] x23: 0000000000000000 x22: fff00008000f7000 x21: fff00008003009f8 [ 1.051884] x20: fff00008000f8000 x19: 1ffff0001001cdea x18: ffff800080769000 [ 1.052469] x17: ffff95c63264ec00 x16: ffff8000800e7504 x15: 0000000000000003 [ 1.053045] x14: ffff95c63482f000 x13: 0000000000000000 x12: ffff783ffc0007bf [ 1.053620] x11: 1ffff83ffc0007be x10: ffff783ffc0007be x9 : dfff800000000000 [ 1.054203] x8 : fffd80010001f000 x7 : ffffffffffffffff x6 : 0000000000000001 [ 1.054776] x5 : 0000000000000000 x4 : fff00008003009f9 x3 : 1ffe00010006013f [ 1.055348] x2 : fff0000800300000 x1 : 0000000000000001 x0 : 0000000000000000 [ 1.055912] Call trace: [ 1.056100] update_range_prot+0x2dc/0x50c (P) [ 1.056478] set_memory_valid+0x44/0x70 [ 1.056850] __kernel_map_pages+0x68/0xe4 [ 1.057226] __free_frozen_pages+0x528/0x1180 [ 1.057601] ___free_pages+0x11c/0x160 [ 1.057961] __free_pages+0x14/0x20 [ 1.058307] kasan_depopulate_vmalloc_pte+0xd4/0x184 [ 1.058748] __apply_to_page_range+0x678/0xda8 [ 1.059149] apply_to_existing_page_range+0x14/0x20 [ 1.059553] kasan_release_vmalloc+0x138/0x200 [ 1.059982] purge_vmap_node+0x1b4/0x8a0 [ 1.060371] __purge_vmap_area_lazy+0x4f8/0x870 [ 1.060779] _vm_unmap_aliases+0x488/0x6ec [ 1.061176] vm_unmap_aliases+0x1c/0x34 [ 1.061567] change_memory_common+0x17c/0x380 [ 1.061949] set_memory_ro+0x18/0x24 [...]
Arguably you could move the resume() to before the __free_page(). But it just illustrates that it's all a bit brittle at the moment...
Difficult to disagree. With things like DEBUG_PAGEALLOC it becomes very hard to know what is guaranteed not to use lazy MMU.
quoted
quoted
Perhaps we also need nested pause/resume? Then you just end up with 2 counters; enable_count and pause_count. Sorry if this has already been discussed.And finally: [D] pause() pause() enable() disable() resume() resume() I don't really mind either way, but I don't see an immediate use for [C] and [D] - the idea is that the paused section is short and controlled, not made up of arbitrary calls.If my thinking above is correct, then I've already demonstrated that this is not the case. So I'd be inclined to go with [D] on the basis that it is the most robust. Keeping 2 nesting counts (enable and pause) feels pretty elegant to me and gives the fewest opportunities for surprises.
Agreed, if we're going to allow enable() within a paused section, then we might as well allow paused sections to nest too. The use-case is clear, so I'm happy to go ahead and make those changes. David, any thoughts? - Kevin
Thanks, Ryanquoted
A potential downside of allowing [C] and [D] is that it makes it harder to detect unintended nesting (fewer VM_WARN assertions). Happy to implement it if this proves useful though. OTOH the idea behind [B] is that it allows the caller of pause()/resume() not to care about whether lazy MMU is actually enabled or not - i.e. the kasan helpers would keep working even if apply_to_page_range() didn't use lazy MMU any more.quoted
quoted
quoted
quoted
+ + state->active = false; arch_leave_lazy_mmu_mode(); } [...]