Thread (58 messages) 58 messages, 7 authors, 2025-09-15

Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections

From: Kevin Brodsky <hidden>
Date: 2025-09-12 08:48:37
Also in: linux-mm, linuxppc-dev, lkml, sparclinux, xen-devel

On 12/09/2025 10:04, David Hildenbrand wrote:
quoted
quoted
struct lazy_mmu_state {
     uint8_t enabled_count;
     bool paused;
Looking at the arm64 implementation, I'm thinking: instead of the paused
member, how about a PF_LAZY_MMU task flag? It would be set when lazy_mmu
is actually enabled (i.e. inside an enter()/leave() section, and not
inside a pause()/resume() section). This way, architectures could use
that flag directly to tell if lazy_mmu is enabled instead of reinventing
the wheel, all in slightly different ways. Namely:

* arm64 uses a thread flag (TIF_LAZY_MMU) - this is trivially replaced
with PF_LAZY_MMU
* powerpc and sparc use batch->active where batch is a per-CPU variable;
I expect this can also be replaced with PF_LAZY_MMU
* x86/xen is more complex as it has xen_lazy_mode which tracks both
LAZY_MMU and LAZY_CPU modes. I'd probably leave that one alone, unless a
Xen expert is motivated to refactor it.

With that approach, the implementation of arch_enter() and arch_leave()
becomes very simple (no tracking of lazy_mmu status) on arm64, powerpc
and sparc.

(Of course we could also have an "enabled" member in lazy_mmu_state
instead of PF_LAZY_MMU, there is no functional difference.)
No strong opinion, but to me it feels like PF_LAZY_MMU is rather "the
effective state when combining nested+paused", and might complicate
the code + sanity checks?

So we could maintain that in addition fairly easily of course from the
core instead of letting archs do that manually.

I would probably have to see the end result to judge whether removing
the "paused" bool makes things look more complicated or not.
Agreed, it is a little difficult to consider all the cases ahead of
time. What is clear to me though is that [paused] can be inferred from
[count + enabled], and conversely [enabled] from [count + paused]. As a
result I really wouldn't store both paused and enabled in task_struct -
duplicating information is how you create inconsistent states.

We can very easily introduce helpers to get the enabled/paused status
regardless of how they're stored. Since "enabled" is what we need to
know in most cases (arch checking the status), I would rather store
"enabled" than "paused". But indeed, let's see how it turns out in practice.
quoted
quoted
}

c) With that config, common-code lazy_mmu_*() functions implement the
updating of the lazy_mmu_state in task_struct and call into arch code
on the transition from 0->1, 1->0 etc.
Indeed, this is how I thought about it. There is actually quite a lot
that can be moved to the generic functions:
* Updating lazy_mmu_state
* Sanity checks on lazy_mmu_state (e.g. underflow/overflow)
* Bailing out if in_interrupt() (not done consistently across arch's at
the moment)
quoted
Maybe that can be done through exiting
arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() callbacks, maybe
we need more. I feel like
we might be able to implement that through the existing helpers.
We might want to rename them to align with the new generic helpers, but
yes otherwise the principle should remain unchanged.

In fact, we will also need to revive arch_flush_lazy_mmu_mode().
That's okay if it's all hidden behaind a sane core API.
quoted
Indeed,
in the nested situation, we need the following arch calls:

enter() -> arch_enter()
     enter() -> [nothing]
     leave() -> arch_flush()
leave() -> arch_leave()

leave() must always flush whatever arch state was batched, as may be
expected by the caller.

How does all that sound?
I am no expert on the "always flush when leaving", but it sounds
reasonable to me.
This is a core expectation for lazy_mmu: when leave() is called, any
batched state is flushed. The fact it currently happens unconditionally
when calling leave() is in fact what stops nesting from exploding on
arm64 with DEBUG_PAGEALLOC [1].

[1] https://lore.kernel.org/all/aEhKSq0zVaUJkomX@arm.com/ (local)
Which arch operations would you call from

pause()
continue()
I also wondered about that. I think the safest is to make them
respectively arch_leave() and arch_enter() - the flushing entailed by
arch_leave() might not be required, but it is safer. Additionally,
powerpc/sparc disable preemption while in lazy_mmu, so it seems like a
good idea to re-enable it while paused (by calling arch_leave()).

- Kevin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help