Thread (19 messages) 19 messages, 4 authors, 2021-11-09

Re: [PATCH v4 01/11] mm: x86, arm64: add arch_has_hw_pte_young()

From: Yu Zhao <hidden>
Date: 2021-08-19 21:23:21
Also in: linux-arm-kernel, lkml

On Thu, Aug 19, 2021 at 3:19 AM Will Deacon [off-list ref] wrote:
On Wed, Aug 18, 2021 at 12:30:57AM -0600, Yu Zhao wrote:
quoted
Some architectures set the accessed bit in PTEs automatically, e.g.,
x86, and arm64 v8.2 and later. On architectures that do not have this
capability, clearing the accessed bit in a PTE triggers a page fault
following the TLB miss.

Being aware of this capability can help make better decisions, i.e.,
whether to limit the size of each batch of PTEs and the burst of
batches when clearing the accessed bit.

Signed-off-by: Yu Zhao <redacted>
---
 arch/arm64/include/asm/cpufeature.h | 19 ++++++-------------
 arch/arm64/include/asm/pgtable.h    | 10 ++++------
 arch/arm64/kernel/cpufeature.c      | 19 +++++++++++++++++++
 arch/arm64/mm/proc.S                | 12 ------------
 arch/arm64/tools/cpucaps            |  1 +
 arch/x86/include/asm/pgtable.h      |  6 +++---
 include/linux/pgtable.h             | 12 ++++++++++++
 mm/memory.c                         | 14 +-------------
 8 files changed, 46 insertions(+), 47 deletions(-)
Please cc linux-arm-kernel and the maintainers on arm64 patches.
Done. Also adding a link to the original post:
https://lore.kernel.org/patchwork/patch/1478354/
quoted
diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h
index 9bb9d11750d7..2020b9e818c8 100644
--- a/arch/arm64/include/asm/cpufeature.h
+++ b/arch/arm64/include/asm/cpufeature.h
@@ -776,6 +776,12 @@ static inline bool system_supports_tlb_range(void)
              cpus_have_const_cap(ARM64_HAS_TLB_RANGE);
 }

+/* Check whether hardware update of the Access flag is supported. */
+static inline bool system_has_hw_af(void)
+{
+     return IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && cpus_have_const_cap(ARM64_HW_AF);
+}
How accurate does this need to be? Heterogeneous (big/little) systems are
very common on arm64, so the existing code enables hardware access flag
unconditionally on CPUs that support it, meaning we could end up running
on a system where some CPUs have hardware update and others do not.

With your change, we only enable hardware access flag if _all_ CPUs support
it (and furthermore, we prevent late onlining of CPUs without the feature
if was detected at boot). This sacrifices a lot of flexibility, particularly
if we end up tackling CPU errata in this area in future, and it's not clear
that it's really required for what you're trying to do.
It doesn't need to be accurate but then my question is how helpful it
is if it's not accurate. Conversely, shouldn't all CPUs have it if
it's really helpful? So it seems to me whether such a flexibility is
needed in the future is questionable -- AFAIK, there are no CPUs (ARM
or not) that have such a behavior in the present. I agree we want to
try to be future proof, but usually this comes at a cost. For this
specific case, we would need two functions to detect the capability at
global and local levels to fully explore this theoretical flexibility.

The bottomline is I don't have a problem with having an additional
function to detect the capability at a global level. Note that the
specific concern in this patchset is that if a CPU thinks all other
CPUs have the capability and clears the accessed bit on many PTEs,
then those who don't have the capability may suffer the faults for
that action. (This is different from the cow_user_page() case.)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help