Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers

[PATCH v3 00/13] Nesting support for lazy MMU mode · Kevin Brodsky <hidden> · 2025-10-15
[PATCH v3 01/13] powerpc/64s: Do not re-activate batched TLB flush · Kevin Brodsky <hidden> · 2025-10-15
[PATCH v3 02/13] x86/xen: simplify flush_lazy_mmu() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 02/13] x86/xen: simplify flush_lazy_mmu() · Dave Hansen <hidden> · 2025-10-15
Re: [PATCH v3 02/13] x86/xen: simplify flush_lazy_mmu() · Kevin Brodsky <hidden> · 2025-10-16
[PATCH v3 03/13] powerpc/mm: implement arch_flush_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 03/13] powerpc/mm: implement arch_flush_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 03/13] powerpc/mm: implement arch_flush_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-24
Re: [PATCH v3 03/13] powerpc/mm: implement arch_flush_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 03/13] powerpc/mm: implement arch_flush_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-24
[PATCH v3 04/13] sparc/mm: implement arch_flush_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 04/13] sparc/mm: implement arch_flush_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-23
[PATCH v3 05/13] mm: introduce CONFIG_ARCH_LAZY_MMU · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 05/13] mm: introduce CONFIG_ARCH_LAZY_MMU · Mike Rapoport <rppt@kernel.org> · 2025-10-18
Re: [PATCH v3 05/13] mm: introduce CONFIG_ARCH_LAZY_MMU · Kevin Brodsky <hidden> · 2025-10-20
Re: [PATCH v3 05/13] mm: introduce CONFIG_ARCH_LAZY_MMU · David Hildenbrand <hidden> · 2025-10-23
[PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Alexander Gordeev <agordeev@linux.ibm.com> · 2025-10-17
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Kevin Brodsky <hidden> · 2025-10-20
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Kevin Brodsky <hidden> · 2025-10-24
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Kevin Brodsky <hidden> · 2025-10-24
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · David Hildenbrand <hidden> · 2025-10-27
Re: [PATCH v3 06/13] mm: introduce generic lazy_mmu helpers · Kevin Brodsky <hidden> · 2025-10-28
[PATCH v3 07/13] mm: enable lazy_mmu sections to nest · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 07/13] mm: enable lazy_mmu sections to nest · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 07/13] mm: enable lazy_mmu sections to nest · Kevin Brodsky <hidden> · 2025-10-24
Re: [PATCH v3 07/13] mm: enable lazy_mmu sections to nest · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 07/13] mm: enable lazy_mmu sections to nest · Kevin Brodsky <hidden> · 2025-10-24
[PATCH v3 08/13] arm64: mm: replace TIF_LAZY_MMU with in_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
[PATCH v3 09/13] powerpc/mm: replace batch->active with in_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 09/13] powerpc/mm: replace batch->active with in_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 09/13] powerpc/mm: replace batch->active with in_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-24
[PATCH v3 10/13] sparc/mm: replace batch->active with in_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 10/13] sparc/mm: replace batch->active with in_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-23
[PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Woodhouse <dwmw2@infradead.org> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Woodhouse <dwmw2@infradead.org> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-27
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · Demi Marie Obenour <hidden> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Hildenbrand <hidden> · 2025-10-27
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · Kevin Brodsky <hidden> · 2025-10-27
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · Kevin Brodsky <hidden> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · David Woodhouse <dwmw2@infradead.org> · 2025-10-24
Re: [PATCH v3 11/13] x86/xen: use lazy_mmu_state when context-switching · Kevin Brodsky <hidden> · 2025-10-27
[PATCH v3 12/13] mm: bail out of lazy_mmu_mode_* in interrupt context · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 12/13] mm: bail out of lazy_mmu_mode_* in interrupt context · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 12/13] mm: bail out of lazy_mmu_mode_* in interrupt context · Kevin Brodsky <hidden> · 2025-10-24
[PATCH v3 13/13] mm: introduce arch_wants_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-15
Re: [PATCH v3 13/13] mm: introduce arch_wants_lazy_mmu_mode() · David Hildenbrand <hidden> · 2025-10-23
Re: [PATCH v3 13/13] mm: introduce arch_wants_lazy_mmu_mode() · Kevin Brodsky <hidden> · 2025-10-24

From: Kevin Brodsky <hidden>
Date: 2025-10-24 14:32:50
Also in: linux-arm-kernel, linux-mm, lkml, sparclinux, xen-devel

On 24/10/2025 15:27, David Hildenbrand wrote:

On 24.10.25 14:13, Kevin Brodsky wrote:

quoted

On 23/10/2025 21:52, David Hildenbrand wrote:

quoted

On 15.10.25 10:27, Kevin Brodsky wrote:

quoted

[...]

* madvise_*_pte_range() call arch_leave() in multiple paths, some
    followed by an immediate exit/rescheduling and some followed by a
    conditional exit. These functions assume that they are called
    with lazy MMU disabled and we cannot simply use pause()/resume()
    to address that. This patch leaves the situation unchanged by
    calling enable()/disable() in all cases.

I'm confused, the function simply does

(a) enables lazy mmu
(b) does something on the page table
(c) disables lazy mmu
(d) does something expensive (split folio -> take sleepable locks,
     flushes tlb)
(e) go to (a)

That step is conditional: we exit right away if pte_offset_map_lock()
fails. The fundamental issue is that pause() must always be matched with
resume(), but as those functions look today there is no situation where
a pause() would always be matched with a resume().

We have matches enable/disable, so my question is rather "why" you are
even thinking about using pause/resume?

What would be the benefit of that? If there is no benefit then just
drop this from the patch description as it's more confusing than just
... doing what the existing code does :)

Ah sorry I misunderstood, I guess you originally meant: why would we use
pause()/resume()?

The issue is the one I mentioned in the commit message: using
enable()/disable(), we assume that the functions are called with lazy
MMU mode is disabled. Consider:

  lazy_mmu_mode_enable()
  madvise_cold_or_pageout_pte_range():
    lazy_mmu_mode_enable()
    ...
    if (need_resched()) {
      lazy_mmu_mode_disable()
      cond_resched() // lazy MMU still enabled
    }

This will explode on architectures that do not allow sleeping while in
lazy MMU mode. I'm not saying this is an actual problem - I don't see
why those functions would be called with lazy MMU mode enabled. But it
does go against the notion that nesting works everywhere.

quoted

Why would we use enable/disable instead?

quoted

* x86/Xen is currently the only case where explicit handling is
    required for lazy MMU when context-switching. This is purely an
    implementation detail and using the generic lazy_mmu_mode_*
    functions would cause trouble when nesting support is introduced,
    because the generic functions must be called from the current
task.
    For that reason we still use arch_leave() and arch_enter() there.

How does this interact with patch #11?

It is a requirement for patch 11, in fact. If we called disable() when
switching out a task, then lazy_mmu_state.enabled would (most likely) be
false when scheduling it again.

By calling the arch_* helpers when context-switching, we ensure
lazy_mmu_state remains unchanged. This is consistent with what happens
on all other architectures (which don't do anything about lazy_mmu when
context-switching). lazy_mmu_state is the lazy MMU status *when the task
is scheduled*, and should be preserved on a context-switch.

Okay, thanks for clarifying. That whole XEN stuff here is rather horrible.

Can't say I disagree... I tried to simplify it further, but the
Xen-specific "LAZY_CPU" mode makes it just too difficult.

quoted

Note: x86 calls arch_flush_lazy_mmu_mode() unconditionally in a few
places, but only defines it if PARAVIRT_XXL is selected, and we are
removing the fallback in <linux/pgtable.h>. Add a new fallback
definition to <asm/pgtable.h> to keep things building.

I can see a call in __kernel_map_pages() and
arch_kmap_local_post_map()/arch_kmap_local_post_unmap().

I guess that is ... harmless/irrelevant in the context of this series?

It should be. arch_flush_lazy_mmu_mode() was only used by x86 before
this series; we're adding new calls to it from the generic layer, but
existing x86 calls shouldn't be affected.

Okay, I'd like to understand the rules when arch_flush_lazy_mmu_mode()
would actually be required in such arch code, but that's outside of
the scope of your patch series.

Not too sure either. A little archaeology shows that the calls were
added by [1][2]. Chances are [1] is no longer relevant since lazy_mmu
isn't directly used in copy_page_range(). 

I think [2] is still required - __kernel_map_pages() can be called while
lazy MMU is already enabled, and AIUI the mapping changes should take
effect by the time __kernel_map_pages() returns. On arm64 we shouldn't
have this problem by virtue of __kernel_map_pages() using lazy_mmu
itself, meaning that the nested call to disable() will trigger a flush.
(This case is in fact the original motivation for supporting nesting.)

- Kevin

[1]
https://lore.kernel.org/all/1319573279-13867-2-git-send-email-konrad.wilk@oracle.com/ (local)
[2]
https://lore.kernel.org/all/1365703192-2089-1-git-send-email-boris.ostrovsky@oracle.com/ (local)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help