Re: [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint()

[PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
[PATCH v6 01/18] mm: Clarify the spec for set_ptes() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
[PATCH v6 02/18] mm: thp: Batch-collapse PMD with set_ptes() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
[PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 03/18] mm: Introduce pte_advance_pfn() and use for pte_next_pfn() · David Hildenbrand <hidden> · 2024-02-15
[PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() · David Hildenbrand <hidden> · 2024-02-15
Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 04/18] arm64/mm: Convert pte_next_pfn() to pte_advance_pfn() · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 05/18] x86/mm: Convert pte_next_pfn() to pte_advance_pfn() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 05/18] x86/mm: Convert pte_next_pfn() to pte_advance_pfn() · David Hildenbrand <hidden> · 2024-02-15
[PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 06/18] mm: Tidy up pte_next_pfn() definition · David Hildenbrand <hidden> · 2024-02-15
[PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep) · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep) · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 07/18] arm64/mm: Convert READ_ONCE(*ptep) to ptep_get(ptep) · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1) · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1) · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 08/18] arm64/mm: Convert set_pte_at() to set_ptes(..., 1) · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear() · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 09/18] arm64/mm: Convert ptep_clear() to ptep_get_and_clear() · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 10/18] arm64/mm: New ptep layer to manage contig bit · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 11/18] arm64/mm: Split __flush_tlb_range() to elide trailing DSB · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-15
[PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-16
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · John Hubbard <jhubbard@nvidia.com> · 2024-02-16
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-20
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-19
Re: [PATCH v6 12/18] arm64/mm: Wire up PTE_CONT for user mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-20
[PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 13/18] arm64/mm: Implement new wrprotect_ptes() batch API · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
[PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 14/18] arm64/mm: Implement new [get_and_]clear_full_ptes() batch APIs · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
[PATCH v6 15/18] mm: Add pte_batch_hint() to reduce scanning in folio_pte_batch() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
[PATCH v6 16/18] arm64/mm: Implement pte_batch_hint() · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint() · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
[PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 17/18] arm64/mm: __always_inline to improve fork() perf · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
[PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-02-15
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Mark Rutland <mark.rutland@arm.com> · 2024-02-15
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Catalin Marinas <catalin.marinas@arm.com> · 2024-02-16
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Kefeng Wang <hidden> · 2024-06-24
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-06-24
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Kefeng Wang <hidden> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Baolin Wang <baolin.wang@linux.alibaba.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Baolin Wang <baolin.wang@linux.alibaba.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Matthew Wilcox <willy@infradead.org> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Matthew Wilcox <willy@infradead.org> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Ryan Roberts <ryan.roberts@arm.com> · 2024-06-25
Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings · Kefeng Wang <hidden> · 2024-06-25
Re: [PATCH v6 00/18] Transparent Contiguous PTEs for User Mappings · Mark Rutland <mark.rutland@arm.com> · 2024-02-15

From: Catalin Marinas <catalin.marinas@arm.com>
Date: 2024-02-16 12:34:09
Also in: linux-arm-kernel, linux-mm, lkml

On Thu, Feb 15, 2024 at 10:32:03AM +0000, Ryan Roberts wrote:

When core code iterates over a range of ptes and calls ptep_get() for
each of them, if the range happens to cover contpte mappings, the number
of pte reads becomes amplified by a factor of the number of PTEs in a
contpte block. This is because for each call to ptep_get(), the
implementation must read all of the ptes in the contpte block to which
it belongs to gather the access and dirty bits.

This causes a hotspot for fork(), as well as operations that unmap
memory such as munmap(), exit and madvise(MADV_DONTNEED). Fortunately we
can fix this by implementing pte_batch_hint() which allows their
iterators to skip getting the contpte tail ptes when gathering the batch
of ptes to operate on. This results in the number of PTE reads returning
to 1 per pte.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: David Hildenbrand <redacted>
Tested-by: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help