Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers
From: Kevin Brodsky <hidden>
Date: 2025-10-24 12:13:46
Also in:
linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel
On 23/10/2025 21:52, David Hildenbrand wrote:
On 15.10.25 10:27, Kevin Brodsky wrote:quoted
[...] * madvise_*_pte_range() call arch_leave() in multiple paths, some followed by an immediate exit/rescheduling and some followed by a conditional exit. These functions assume that they are called with lazy MMU disabled and we cannot simply use pause()/resume() to address that. This patch leaves the situation unchanged by calling enable()/disable() in all cases.I'm confused, the function simply does (a) enables lazy mmu (b) does something on the page table (c) disables lazy mmu (d) does something expensive (split folio -> take sleepable locks, flushes tlb) (e) go to (a)
That step is conditional: we exit right away if pte_offset_map_lock() fails. The fundamental issue is that pause() must always be matched with resume(), but as those functions look today there is no situation where a pause() would always be matched with a resume(). Alternatively it should be possible to pause(), unconditionally resume() after the expensive operations are done and then leave() right away in case of failure. It requires restructuring and might look a bit strange, but can be done if you think it's justified.
Why would we use enable/disable instead?quoted
* x86/Xen is currently the only case where explicit handling is required for lazy MMU when context-switching. This is purely an implementation detail and using the generic lazy_mmu_mode_* functions would cause trouble when nesting support is introduced, because the generic functions must be called from the current task. For that reason we still use arch_leave() and arch_enter() there.How does this interact with patch #11?
It is a requirement for patch 11, in fact. If we called disable() when switching out a task, then lazy_mmu_state.enabled would (most likely) be false when scheduling it again. By calling the arch_* helpers when context-switching, we ensure lazy_mmu_state remains unchanged. This is consistent with what happens on all other architectures (which don't do anything about lazy_mmu when context-switching). lazy_mmu_state is the lazy MMU status *when the task is scheduled*, and should be preserved on a context-switch.
quoted
Note: x86 calls arch_flush_lazy_mmu_mode() unconditionally in a few places, but only defines it if PARAVIRT_XXL is selected, and we are removing the fallback in <linux/pgtable.h>. Add a new fallback definition to <asm/pgtable.h> to keep things building.I can see a call in __kernel_map_pages() and arch_kmap_local_post_map()/arch_kmap_local_post_unmap(). I guess that is ... harmless/irrelevant in the context of this series?
It should be. arch_flush_lazy_mmu_mode() was only used by x86 before this series; we're adding new calls to it from the generic layer, but existing x86 calls shouldn't be affected. - Kevin