Thread (63 messages) 63 messages, 6 authors, 2023-11-21

Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

From: Kuan-Ying Lee (李冠穎) <hidden>
Date: 2023-09-13 08:12:24
Also in: kvmarm, linux-arch, linux-arm-kernel, linux-fsdevel, linux-mm, lkml

On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote:
quoted hunk ↗ jump to hunk
Introduction
============

Arm has implemented memory coloring in hardware, and the feature is
called
Memory Tagging Extensions (MTE). It works by embedding a 4 bit tag in
bits
59..56 of a pointer, and storing this tag to a reserved memory
location.
When the pointer is dereferenced, the hardware compares the tag
embedded in
the pointer (logical tag) with the tag stored in memory (allocation
tag).

The relation between memory and where the tag for that memory is
stored is
static.

The memory where the tags are stored have been so far unaccessible to
Linux.
This series aims to change that, by adding support for using the tag
storage
memory only as data memory; tag storage memory cannot be itself
tagged.


Implementation
==============

The series is based on v6.5-rc3 with these two patches cherry picked:

- mm: Call arch_swap_restore() from unuse_pte():

    
https://lore.kernel.org/all/20230523004312.1807357-3-pcc@google.com/ (local)

- arm64: mte: Simplify swap tag restoration logic:

    
https://lore.kernel.org/all/20230523004312.1807357-4-pcc@google.com/ (local)

The above two patches are queued for the v6.6 merge window:

    
https://lore.kernel.org/all/20230702123821.04e64ea2c04dd0fdc947bda3@linux-foundation.org/ (local)

The entire series, including the above patches, can be cloned with:

$ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \
	-b arm-mte-dynamic-carveout-rfc-v1

On the arm64 architecture side, an extension is being worked on that
will
clarify how MTE tag storage reuse should behave. The extension will
be
made public soon.

On the Linux side, MTE tag storage reuse is accomplished with the
following changes:

1. The tag storage memory is exposed to the memory allocator as a new
migratetype, MIGRATE_METADATA. It behaves similarly to MIGRATE_CMA,
with
the restriction that it cannot be used to allocate tagged memory (tag
storage memory cannot be tagged). On tagged page allocation, the
corresponding tag storage is reserved via alloc_contig_range().

2. mprotect(PROT_MTE) is implemented by changing the pte prot to
PAGE_METADATA_NONE. When the page is next accessed, a fault is taken
and
the corresponding tag storage is reserved.

3. When the code tries to copy tags to a page which doesn't have the
tag
storage reserved, the tags are copied to an xarray and restored in
set_pte_at(), when the page is eventually mapped with the tag storage
reserved.

KVM support has not been implemented yet, that because a non-MTE
enabled VMA
can back the memory of an MTE-enabled VM. After there is a consensus
on the
right approach on the memory management support, I will add it.

Explanations for the last two changes follow. The gist of it is that
they
were added mostly because of races, and it my intention to make the
code
more robust.

PAGE_METADATA_NONE was introduced to avoid races with
mprotect(PROT_MTE).
For example, migration can race with mprotect(PROT_MTE):
- thread 0 initiates migration for a page in a non-MTE enabled VMA
and a
  destination page is allocated without tag storage.
- thread 1 handles an mprotect(PROT_MTE), the VMA becomes tagged, and
an
  access turns the source page that is in the process of being
migrated
  into a tagged page.
- thread 0 finishes migration and the destination page is mapped as
tagged,
  but without tag storage reserved.
More details and examples can be found in the patches.

This race is also related to how tag restoring is handled when tag
storage
is missing: when a tagged page is swapped out, the tags are saved in
an
xarray indexed by swp_entry.val. When a page is swapped back in, if
there
are tags corresponding to the swp_entry that the page will replace,
the
tags are unconditionally restored, even if the page will be mapped as
untagged. Because the page will be mapped as untagged, tag storage
was
not reserved when the page was allocated to replace the swp_entry
which has
tags associated with it.

To get around this, save the tags in a new xarray, this time indexed
by
pfn, and restore them when the same page is mapped as tagged.

This also solves another race, this time with copy_highpage. In the
scenario where migration races with mprotect(PROT_MTE), before the
page is
mapped, the contents of the source page is copied to the destination.
And
this includes tags, which will be copied to a page with missing tag
storage, which can to data corruption if the missing tag storage is
in use
for data. So copy_highpage() has received a similar treatment to the
swap
code, and the source tags are copied in the xarray indexed by the
destination page pfn.


Overview of the patches
=======================

Patches 1-3 do some preparatory work by renaming a few functions and
a gfp
flag.

Patches 4-12 are arch independent and introduce MIGRATE_METADATA to
the
page allocator.

Patches 13-18 are arm64 specific and add support for detecting the
tag
storage region and onlining it with the MIGRATE_METADATA migratetype.

Patches 19-24 are arch independent and modify the page allocator to
callback into arch dependant functions to reserve metadata storage
for an
allocation which requires metadata.

Patches 25-28 are mostly arm64 specific and implement the reservation
and
freeing of tag storage on tagged page allocation. Patch #28 ("mm:
sched:
Introduce PF_MEMALLOC_ISOLATE") adds a current flag,
PF_MEMALLOC_ISOLATE,
which ignores page isolation limits; this is used by arm64 when
reserving
tag storage in the same patch.

Patches 29-30 add arch independent support for doing
mprotect(PROT_MTE)
when metadata storage is enabled.

Patches 31-37 are mostly arm64 specific and handle the restoring of
tags
when tag storage is missing. The exceptions are patches 32 (adds the
arch_swap_prepare_to_restore() function) and 35 (add
PAGE_METADATA_NONE
support for THPs).

Testing
=======

To enable MTE dynamic tag storage:

- CONFIG_ARM64_MTE_TAG_STORAGE=y
- system_supports_mte() returns true
- kasan_hw_tags_enabled() returns false
- correct DTB node (for the specification, see commit "arm64: mte:
Reserve tag
  storage memory")

Check dmesg for the message "MTE tag storage enabled" or grep for
metadata
in /proc/vmstat.

I've tested the series using FVP with MTE enabled, but without
support for
dynamic tag storage reuse. To simulate it, I've added two fake tag
storage
regions in the DTB by splitting a 2GB region roughly into 33 slices
of size
0x3e0_0000, and using 32 of them for tagged memory and one slice for
tag
storage:
diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
index 60472d65a355..bd050373d6cf 100644
--- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts
+++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts
@@ -165,10 +165,28 @@ C1_L2: l2-cache1 {
                };
        };
 
-       memory@80000000 {
+       memory0: memory@80000000 {
                device_type = "memory";
-               reg = <0x00000000 0x80000000 0 0x80000000>,
-                     <0x00000008 0x80000000 0 0x80000000>;
+               reg = <0x00 0x80000000 0x00 0x7c000000>;
+       };
+
+       metadata0: metadata@c0000000  {
+               compatible = "arm,mte-tag-storage";
+               reg = <0x00 0xfc000000 0x00 0x3e00000>;
+               block-size = <0x1000>;
+               memory = <&memory0>;
+       };
+
+       memory1: memory@880000000 {
+               device_type = "memory";
+               reg = <0x08 0x80000000 0x00 0x7c000000>;
+       };
+
+       metadata1: metadata@8c0000000  {
+               compatible = "arm,mte-tag-storage";
+               reg = <0x08 0xfc000000 0x00 0x3e00000>;
+               block-size = <0x1000>;
+               memory = <&memory1>;
        };
 
Hi Alexandru,

AFAIK, the above memory configuration means that there are two region
of dram(0x80000000-0xfc000000 and 0x8_80000000-0x8_fc0000000) and this
is called PDD memory map.

Document[1] said there are some constraints of tag memory as below.

| The following constraints apply to the tag regions in DRAM:
| 1. The tag region cannot be interleaved with the data region.
| The tag region must also be above the data region within DRAM.
|
| 2.The tag region in the physical address space cannot straddle
| multiple regions of a memory map.
|
| PDD memory map is not allowed to have part of the tag region between
| 2GB-4GB and another part between 34GB-64GB.


I'm not sure if we can separate tag memory with the above
configuration. Or do I miss something?

[1] https://developer.arm.com/documentation/101569/0300/?lang=en
(Section 5.4.6.1)

Thanks,
Kuan-Ying Lee
        reserved-memory {


Alexandru Elisei (37):
  mm: page_alloc: Rename gfp_to_alloc_flags_cma ->
    gfp_to_alloc_flags_fast
  arm64: mte: Rework naming for tag manipulation functions
  arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED
  mm: Add MIGRATE_METADATA allocation policy
  mm: Add memory statistics for the MIGRATE_METADATA allocation
policy
  mm: page_alloc: Allocate from movable pcp lists only if
    ALLOC_FROM_METADATA
  mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages
  mm: compaction: Account for free metadata pages in
    __compact_finished()
  mm: compaction: Handle metadata pages as source for direct
compaction
  mm: compaction: Do not use MIGRATE_METADATA to replace pages with
    metadata
  mm: migrate/mempolicy: Allocate metadata-enabled destination page
  mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages
  arm64: mte: Reserve tag storage memory
  arm64: mte: Expose tag storage pages to the MIGRATE_METADATA
freelist
  arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK
  arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is
disabled
  arm64: mte: Disable dynamic tag storage management if HW KASAN is
    enabled
  arm64: mte: Check that tag storage blocks are in the same zone
  mm: page_alloc: Manage metadata storage on page allocation
  mm: compaction: Reserve metadata storage in compaction_alloc()
  mm: khugepaged: Handle metadata-enabled VMAs
  mm: shmem: Allocate metadata storage for in-memory filesystems
  mm: Teach vma_alloc_folio() about metadata-enabled VMAs
  mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA
  arm64: mte: Manage tag storage on page allocation
  arm64: mte: Perform CMOs for tag blocks on tagged page
allocation/free
  arm64: mte: Reserve tag block for the zero page
  mm: sched: Introduce PF_MEMALLOC_ISOLATE
  mm: arm64: Define the PAGE_METADATA_NONE page protection
  mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE)
  mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing
metadata
    storage
  mm: Call arch_swap_prepare_to_restore() before arch_swap_restore()
  arm64: mte: swap/copypage: Handle tag restoring when missing tag
    storage
  arm64: mte: Handle fatal signal in reserve_metadata_storage()
  mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages
  KVM: arm64: Disable MTE is tag storage is enabled
  arm64: mte: Enable tag storage management

 arch/arm64/Kconfig                       |  13 +
 arch/arm64/include/asm/assembler.h       |  10 +
 arch/arm64/include/asm/memory_metadata.h |  49 ++
 arch/arm64/include/asm/mte-def.h         |  16 +-
 arch/arm64/include/asm/mte.h             |  40 +-
 arch/arm64/include/asm/mte_tag_storage.h |  36 ++
 arch/arm64/include/asm/page.h            |   5 +-
 arch/arm64/include/asm/pgtable-prot.h    |   2 +
 arch/arm64/include/asm/pgtable.h         |  33 +-
 arch/arm64/kernel/Makefile               |   1 +
 arch/arm64/kernel/elfcore.c              |  14 +-
 arch/arm64/kernel/hibernate.c            |  46 +-
 arch/arm64/kernel/mte.c                  |  31 +-
 arch/arm64/kernel/mte_tag_storage.c      | 667
+++++++++++++++++++++++
 arch/arm64/kernel/setup.c                |   7 +
 arch/arm64/kvm/arm.c                     |   6 +-
 arch/arm64/lib/mte.S                     |  30 +-
 arch/arm64/mm/copypage.c                 |  26 +
 arch/arm64/mm/fault.c                    |  35 +-
 arch/arm64/mm/mteswap.c                  | 113 +++-
 fs/proc/meminfo.c                        |   8 +
 fs/proc/page.c                           |   1 +
 include/asm-generic/Kbuild               |   1 +
 include/asm-generic/memory_metadata.h    |  50 ++
 include/linux/gfp.h                      |  10 +
 include/linux/gfp_types.h                |  14 +-
 include/linux/huge_mm.h                  |   6 +
 include/linux/kernel-page-flags.h        |   1 +
 include/linux/migrate_mode.h             |   1 +
 include/linux/mm.h                       |  12 +-
 include/linux/mmzone.h                   |  26 +-
 include/linux/page-flags.h               |   1 +
 include/linux/pgtable.h                  |  19 +
 include/linux/sched.h                    |   2 +-
 include/linux/sched/mm.h                 |  13 +
 include/linux/vm_event_item.h            |   5 +
 include/linux/vmstat.h                   |   2 +
 include/trace/events/mmflags.h           |   5 +-
 mm/Kconfig                               |   5 +
 mm/compaction.c                          |  52 +-
 mm/huge_memory.c                         | 109 ++++
 mm/internal.h                            |   7 +
 mm/khugepaged.c                          |   7 +
 mm/memory.c                              | 180 +++++-
 mm/mempolicy.c                           |   7 +
 mm/migrate.c                             |   6 +
 mm/mm_init.c                             |  23 +-
 mm/mprotect.c                            |  46 ++
 mm/page_alloc.c                          | 136 ++++-
 mm/page_isolation.c                      |  19 +-
 mm/page_owner.c                          |   3 +-
 mm/shmem.c                               |  14 +-
 mm/show_mem.c                            |   4 +
 mm/swapfile.c                            |   4 +
 mm/vmscan.c                              |   3 +
 mm/vmstat.c                              |  13 +-
 56 files changed, 1834 insertions(+), 161 deletions(-)
 create mode 100644 arch/arm64/include/asm/memory_metadata.h
 create mode 100644 arch/arm64/include/asm/mte_tag_storage.h
 create mode 100644 arch/arm64/kernel/mte_tag_storage.c
 create mode 100644 include/asm-generic/memory_metadata.h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help