Thread (58 messages) 58 messages, 8 authors, 2025-10-28

Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers

From: Kevin Brodsky <hidden>
Date: 2025-10-24 12:13:46
Also in: linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel

On 23/10/2025 21:52, David Hildenbrand wrote:
On 15.10.25 10:27, Kevin Brodsky wrote:
quoted
[...]

* madvise_*_pte_range() call arch_leave() in multiple paths, some
   followed by an immediate exit/rescheduling and some followed by a
   conditional exit. These functions assume that they are called
   with lazy MMU disabled and we cannot simply use pause()/resume()
   to address that. This patch leaves the situation unchanged by
   calling enable()/disable() in all cases.
I'm confused, the function simply does

(a) enables lazy mmu
(b) does something on the page table
(c) disables lazy mmu
(d) does something expensive (split folio -> take sleepable locks,
    flushes tlb)
(e) go to (a)
That step is conditional: we exit right away if pte_offset_map_lock()
fails. The fundamental issue is that pause() must always be matched with
resume(), but as those functions look today there is no situation where
a pause() would always be matched with a resume().

Alternatively it should be possible to pause(), unconditionally resume()
after the expensive operations are done and then leave() right away in
case of failure. It requires restructuring and might look a bit strange,
but can be done if you think it's justified.
Why would we use enable/disable instead?
quoted
* x86/Xen is currently the only case where explicit handling is
   required for lazy MMU when context-switching. This is purely an
   implementation detail and using the generic lazy_mmu_mode_*
   functions would cause trouble when nesting support is introduced,
   because the generic functions must be called from the current task.
   For that reason we still use arch_leave() and arch_enter() there.
How does this interact with patch #11? 
It is a requirement for patch 11, in fact. If we called disable() when
switching out a task, then lazy_mmu_state.enabled would (most likely) be
false when scheduling it again.

By calling the arch_* helpers when context-switching, we ensure
lazy_mmu_state remains unchanged. This is consistent with what happens
on all other architectures (which don't do anything about lazy_mmu when
context-switching). lazy_mmu_state is the lazy MMU status *when the task
is scheduled*, and should be preserved on a context-switch.
quoted
Note: x86 calls arch_flush_lazy_mmu_mode() unconditionally in a few
places, but only defines it if PARAVIRT_XXL is selected, and we are
removing the fallback in <linux/pgtable.h>. Add a new fallback
definition to <asm/pgtable.h> to keep things building.
I can see a call in __kernel_map_pages() and
arch_kmap_local_post_map()/arch_kmap_local_post_unmap().

I guess that is ... harmless/irrelevant in the context of this series?
It should be. arch_flush_lazy_mmu_mode() was only used by x86 before
this series; we're adding new calls to it from the generic layer, but
existing x86 calls shouldn't be affected.

- Kevin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help