Thread (14 messages) 14 messages, 5 authors, 2025-03-19

Re: [PATCH v4 3/5] mm: madvise: implement lightweight guard page mechanism

From: Vlastimil Babka <hidden>
Date: 2024-10-29 10:32:29
Also in: linux-alpha, linux-arch, linux-kselftest, linux-mips, linux-mm, lkml

On 10/28/24 15:13, Lorenzo Stoakes wrote:
Implement a new lightweight guard page feature, that is regions of userland
virtual memory that, when accessed, cause a fatal signal to arise.

Currently users must establish PROT_NONE ranges to achieve this.

However this is very costly memory-wise - we need a VMA for each and every
one of these regions AND they become unmergeable with surrounding VMAs.

In addition repeated mmap() calls require repeated kernel context switches
and contention of the mmap lock to install these ranges, potentially also
having to unmap memory if installed over existing ranges.

The lightweight guard approach eliminates the VMA cost altogether - rather
than establishing a PROT_NONE VMA, it operates at the level of page table
entries - establishing PTE markers such that accesses to them cause a fault
followed by a SIGSGEV signal being raised.

This is achieved through the PTE marker mechanism, which we have already
extended to provide PTE_MARKER_GUARD, which we installed via the generic
page walking logic which we have extended for this purpose.

These guard ranges are established with MADV_GUARD_INSTALL. If the range in
which they are installed contain any existing mappings, they will be
zapped, i.e. free the range and unmap memory (thus mimicking the behaviour
of MADV_DONTNEED in this respect).

Any existing guard entries will be left untouched. There is therefore no
nesting of guarded pages.

Guarded ranges are NOT cleared by MADV_DONTNEED nor MADV_FREE (in both
instances the memory range may be reused at which point a user would expect
guards to still be in place), but they are cleared via MADV_GUARD_REMOVE,
process teardown or unmapping of memory ranges.

The guard property can be removed from ranges via MADV_GUARD_REMOVE. The
ranges over which this is applied, should they contain non-guard entries,
will be untouched, with only guard entries being cleared.

We permit this operation on anonymous memory only, and only VMAs which are
non-special, non-huge and not mlock()'d (if we permitted this we'd have to
drop locked pages which would be rather counterintuitive).

Racing page faults can cause repeated attempts to install guard pages that
are interrupted, result in a zap, and this process can end up being
repeated. If this happens more than would be expected in normal operation,
we rescind locks and retry the whole thing, which avoids lock contention in
this scenario.

Suggested-by: Vlastimil Babka <redacted>
Suggested-by: Jann Horn <jannh@google.com>
Suggested-by: David Hildenbrand <redacted>
Signed-off-by: Lorenzo Stoakes <redacted>
Acked-by: Vlastimil Babka <redacted>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help