Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections
From: David Hildenbrand <hidden>
Date: 2025-09-12 08:04:19
Also in:
linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel
quoted
struct lazy_mmu_state { uint8_t enabled_count; bool paused;Looking at the arm64 implementation, I'm thinking: instead of the paused member, how about a PF_LAZY_MMU task flag? It would be set when lazy_mmu is actually enabled (i.e. inside an enter()/leave() section, and not inside a pause()/resume() section). This way, architectures could use that flag directly to tell if lazy_mmu is enabled instead of reinventing the wheel, all in slightly different ways. Namely: * arm64 uses a thread flag (TIF_LAZY_MMU) - this is trivially replaced with PF_LAZY_MMU * powerpc and sparc use batch->active where batch is a per-CPU variable; I expect this can also be replaced with PF_LAZY_MMU * x86/xen is more complex as it has xen_lazy_mode which tracks both LAZY_MMU and LAZY_CPU modes. I'd probably leave that one alone, unless a Xen expert is motivated to refactor it. With that approach, the implementation of arch_enter() and arch_leave() becomes very simple (no tracking of lazy_mmu status) on arm64, powerpc and sparc. (Of course we could also have an "enabled" member in lazy_mmu_state instead of PF_LAZY_MMU, there is no functional difference.)
No strong opinion, but to me it feels like PF_LAZY_MMU is rather "the effective state when combining nested+paused", and might complicate the code + sanity checks? So we could maintain that in addition fairly easily of course from the core instead of letting archs do that manually. I would probably have to see the end result to judge whether removing the "paused" bool makes things look more complicated or not.
quoted
} c) With that config, common-code lazy_mmu_*() functions implement the updating of the lazy_mmu_state in task_struct and call into arch code on the transition from 0->1, 1->0 etc.Indeed, this is how I thought about it. There is actually quite a lot that can be moved to the generic functions: * Updating lazy_mmu_state * Sanity checks on lazy_mmu_state (e.g. underflow/overflow) * Bailing out if in_interrupt() (not done consistently across arch's at the moment)quoted
Maybe that can be done through exiting arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() callbacks, maybe we need more. I feel like we might be able to implement that through the existing helpers.We might want to rename them to align with the new generic helpers, but yes otherwise the principle should remain unchanged. In fact, we will also need to revive arch_flush_lazy_mmu_mode().
That's okay if it's all hidden behaind a sane core API.
Indeed, in the nested situation, we need the following arch calls: enter() -> arch_enter() enter() -> [nothing] leave() -> arch_flush() leave() -> arch_leave() leave() must always flush whatever arch state was batched, as may be expected by the caller. How does all that sound?
I am no expert on the "always flush when leaving", but it sounds reasonable to me. Which arch operations would you call from pause() continue() ?
quoted
And on top of the proposal above we will have some struct arch_lazy_mmu_state; define by the architecture (could be an empty struct on most). We can store that inside "struct lazy_mmu_state;" or if we ever have to, start returning only that from the enable/disable etc. functions.I'm not sure we'd want to mix those styles (task_struct member + local variable), that's adding complexity without much upside... Also having a local variable at every nesting level only makes sense if we have an arch callback regardless of nesting level, which is unnecessary in this proposed API.
Yes, that was rather a "if we ever really run out of space we could look into that", I am not a fan of it obviously. -- Cheers David / dhildenb