Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse

[PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 01/37] mm: page_alloc: Rename gfp_to_alloc_flags_cma -> gfp_to_alloc_flags_fast · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 02/37] arm64: mte: Rework naming for tag manipulation functions · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 03/37] arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy · Alexandru Elisei <hidden> · 2023-08-23
Re: [PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy · Hyesoo Yu <hidden> · 2023-10-12
Re: [PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy · Alexandru Elisei <hidden> · 2023-10-16
Re: [PATCH RFC 04/37] mm: Add MIGRATE_METADATA allocation policy · Hyesoo Yu <hidden> · 2023-10-23
[PATCH RFC 05/37] mm: Add memory statistics for the MIGRATE_METADATA allocation policy · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Alexandru Elisei <hidden> · 2023-08-23
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Hyesoo Yu <hidden> · 2023-10-12
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Alexandru Elisei <hidden> · 2023-10-16
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Catalin Marinas <catalin.marinas@arm.com> · 2023-10-17
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Hyesoo Yu <hidden> · 2023-10-23
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Catalin Marinas <catalin.marinas@arm.com> · 2023-10-23
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · David Hildenbrand <hidden> · 2023-10-23
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · Catalin Marinas <catalin.marinas@arm.com> · 2023-10-23
Re: [PATCH RFC 06/37] mm: page_alloc: Allocate from movable pcp lists only if ALLOC_FROM_METADATA · David Hildenbrand <hidden> · 2023-10-23
[PATCH RFC 07/37] mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 08/37] mm: compaction: Account for free metadata pages in __compact_finished() · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 09/37] mm: compaction: Handle metadata pages as source for direct compaction · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 10/37] mm: compaction: Do not use MIGRATE_METADATA to replace pages with metadata · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 11/37] mm: migrate/mempolicy: Allocate metadata-enabled destination page · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 12/37] mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 13/37] arm64: mte: Reserve tag storage memory · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 14/37] arm64: mte: Expose tag storage pages to the MIGRATE_METADATA freelist · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 15/37] arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 18/37] arm64: mte: Check that tag storage blocks are in the same zone · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 17/37] arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled · Alexandru Elisei <hidden> · 2023-08-23
Re: [PATCH RFC 17/37] arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled · Hyesoo Yu <hidden> · 2023-10-12
Re: [PATCH RFC 17/37] arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled · Alexandru Elisei <hidden> · 2023-10-16
[PATCH RFC 29/37] mm: arm64: Define the PAGE_METADATA_NONE page protection · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 19/37] mm: page_alloc: Manage metadata storage on page allocation · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 30/37] mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE) · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 31/37] mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing metadata storage · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 33/37] arm64: mte: swap/copypage: Handle tag restoring when missing tag storage · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 32/37] mm: Call arch_swap_prepare_to_restore() before arch_swap_restore() · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 36/37] KVM: arm64: Disable MTE is tag storage is enabled · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 34/37] arm64: mte: Handle fatal signal in reserve_metadata_storage() · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc() · Alexandru Elisei <hidden> · 2023-08-23
Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc() · Peter Collingbourne <hidden> · 2023-11-21
Re: [PATCH RFC 20/37] mm: compaction: Reserve metadata storage in compaction_alloc() · Alexandru Elisei <hidden> · 2023-11-21
[PATCH RFC 35/37] mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 28/37] mm: sched: Introduce PF_MEMALLOC_ISOLATE · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 23/37] mm: Teach vma_alloc_folio() about metadata-enabled VMAs · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 26/37] arm64: mte: Perform CMOs for tag blocks on tagged page allocation/free · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 22/37] mm: shmem: Allocate metadata storage for in-memory filesystems · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 37/37] arm64: mte: Enable tag storage management · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 25/37] arm64: mte: Manage tag storage on page allocation · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 24/37] mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 21/37] mm: khugepaged: Handle metadata-enabled VMAs · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 16/37] arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is disabled · Alexandru Elisei <hidden> · 2023-08-23
[PATCH RFC 27/37] arm64: mte: Reserve tag block for the zero page · Alexandru Elisei <hidden> · 2023-08-23
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · David Hildenbrand <hidden> · 2023-08-24
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-08-24
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · David Hildenbrand <hidden> · 2023-08-24
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · David Hildenbrand <hidden> · 2023-08-24
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-08-24
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Alexandru Elisei <hidden> · 2023-09-06
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-09-11
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · David Hildenbrand <hidden> · 2023-09-11
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-09-13
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Hyesoo Yu <hidden> · 2023-10-25
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Alexandru Elisei <hidden> · 2023-10-25
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Hyesoo Yu <hidden> · 2023-10-25
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-10-27
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Kuan-Ying Lee (李冠穎) <hidden> · 2023-09-13
Re: [PATCH RFC 00/37] Add support for arm64 MTE dynamic tag storage reuse · Catalin Marinas <catalin.marinas@arm.com> · 2023-09-14

From: Alexandru Elisei <hidden>
Date: 2023-09-06 11:23:28
Also in: kvmarm, linux-arch, linux-fsdevel, linux-mm, linux-trace-kernel, lkml

Hi,

Thank you for the feedback!

Catalin did a great job explaining what this patch series does, I'll add my
own comments on top of his.

On Thu, Aug 24, 2023 at 04:24:30PM +0100, Catalin Marinas wrote:

On Thu, Aug 24, 2023 at 01:25:41PM +0200, David Hildenbrand wrote:

quoted

On 24.08.23 13:06, David Hildenbrand wrote:

quoted

On 24.08.23 12:44, Catalin Marinas wrote:

quoted

The way MTE is implemented currently is to have a static carve-out of
the DRAM to store the allocation tags (a.k.a. memory colour). This is
what we call the tag storage. Each 16 bytes have 4 bits of tags, so this
means 1/32 of the DRAM, roughly 3% used for the tag storage. This is
done transparently by the hardware/interconnect (with firmware setup)
and normally hidden from the OS. So a checked memory access to location
X generates a tag fetch from location Y in the carve-out and this tag is
compared with the bits 59:56 in the pointer. The correspondence from X
to Y is linear (subject to a minimum block size to deal with some
address interleaving). The software doesn't need to know about this
correspondence as we have specific instructions like STG/LDG to location
X that lead to a tag store/load to Y.

Now, not all memory used by applications is tagged (mmap(PROT_MTE)).
For example, some large allocations may not use PROT_MTE at all or only
for the first and last page since initialising the tags takes time. The
side-effect is that of these 3% DRAM, only part, say 1% is effectively
used. Some people want the unused tag storage to be released for normal
data usage (i.e. give it to the kernel page allocator).

[...]

quoted

So it sounds like you might want to provide that tag memory using CMA.

That way, only movable allocations can end up on that CMA memory area,
and you can allocate selected tag pages on demand (similar to the
alloc_contig_range() use case).

That also solves the issue that such tag memory must not be longterm-pinned.

Regarding one complication: "The kernel needs to know where to allocate
a PROT_MTE page from or migrate a current page if it becomes PROT_MTE
(mprotect()) and the range it is in does not support tagging.",
simplified handling would be if it's in a MIGRATE_CMA pageblock, it
doesn't support tagging. You have to migrate to a !CMA page (for
example, not specifying GFP_MOVABLE as a quick way to achieve that).

Okay, I now realize that this patch set effectively duplicates some CMA
behavior using a new migrate-type.

Yes, pretty much, with some additional hooks to trigger migration. The
CMA mechanism was a great source of inspiration.

In addition, there are some races that are addressed mostly around page
migration/copying: the source page is untagged, the destination
allocated as untagged but before the copy an mprotect() makes the source
tagged (PG_mte_tagged set) and the copy_highpage() mechanism not having
anywhere to store the tags.

quoted

Yeah, that's probably not what we want just to identify if memory is
taggable or not.

Maybe there is a way to just keep reusing most of CMA instead.

A potential issue is that devices (mobile phones) may need a different
CMA range as well for DMA (and not necessarily in ZONE_DMA). Can
free_area[MIGRATE_CMA] handle multiple disjoint ranges? I don't see why
not as it's just a list.

I don't think that's a problem either, today the user can specify multiple
CMA ranges on the kernel command line (via "cma", "hugetlb_cma", etc). CMA
already has the mechanism to keep track of multiple regions - it stores in
the cma_areas array.

We (Google and Arm) went through a few rounds of discussions and
prototyping trying to find the best approach: (1) a separate free_area[]
array in each zone (early proof of concept from Peter C and Evgenii S,
https://github.com/google/sanitizers/tree/master/mte-dynamic-carveout),
(2) a new ZONE_METADATA, (3) a separate CPU-less NUMA node just for the
tag storage, (4) a new MIGRATE_METADATA type.

We settled on the latter as it closely resembles CMA without interfering
with it. I don't remember why we did not just go for MIGRATE_CMA, it may
have been the heterogeneous memory aspect and the fact that we don't
want PROT_MTE (VM_MTE) allocations from this range. If the hardware
allowed this, I think the patches would have been a bit simpler.

You are correct, we settled on a new migrate type because the tag storage
memory is fundamentally a different memory type with different properties
than the rest of the memory in the system: tag storage memory cannot be
tagged, MIGRATE_CMA memory can be tagged.

Alex can comment more next week on how we ended up with this choice but
if we find a way to avoid VM_MTE allocations from certain areas, I think
we can reuse the CMA infrastructure. A bigger hammer would be no VM_MTE
allocations from any CMA range but it seems too restrictive.

I considered mixing the tag storage memory memory with normal memory and
adding it to MIGRATE_CMA. But since tag storage memory cannot be tagged,
this means that it's not enough anymore to have a __GFP_MOVABLE allocation
request to use MIGRATE_CMA.

I considered two solutions to this problem:

1. Only allocate from MIGRATE_CMA is the requested memory is not tagged =>
this effectively means transforming all memory from MIGRATE_CMA into the
MIGRATE_METADATA migratetype that the series introduces. Not very
appealing, because that means treating normal memory that is also on the
MIGRATE_CMA lists as tagged memory.

2. Keep track of which pages are tag storage at page granularity (either by
a page flag, or by checking that the pfn falls in one of the tag storage
region, or by some other mechanism). When the page allocator takes free
pages from the MIGRATE_METADATA list to satisfy an allocation, compare the
gfp mask with the page type, and if the allocation is tagged and the page
is a tag storage page, put it back at the tail of the free list and choose
the next page. Repeat until the page allocator finds a normal memory page
that can be tagged (some refinements obviously needed to need to avoid
infinite loops).

I considered solution 2 to be more complicated than keeping track of tag
storage page at the migratetype level. Conceptually, keeping two distinct
memory type on separate migrate types looked to me like the cleaner and
simpler solution.

Maybe I missed something, I'm definitely open to suggestions regarding
putting the tag storage pages on MIGRATE_CMA (or another migratetype) if
that's a better approach.

Might be worth pointing out that putting the tag storage memory on the
MIGRATE_CMA migratetype only changes how the page allocator allocates
pages; all the other changes to migration/compaction/mprotect/etc will
still be there, because they are needed not because of how the tag storage
memory is represented by the page allocator, but because tag storage memory
cannot be tagged, and regular memory can.

Thanks,
Alex

quoted

Another simpler idea to get started would be to just intercept the first
PROT_MTE, and allocate all CMA memory. In that case, systems that don't ever
use PROT_MTE can have that additional 3% of memory.

We had this on the table as well but the most likely deployment, at
least initially, is only some secure services enabling MTE with various
apps gradually moving towards this in time. So that's why the main
pushback from vendors is having this 3% reserved permanently. Even if
all apps use MTE, only the anonymous mappings are PROT_MTE, so still not
fully using the tag storage.

-- 
Catalin

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help