Thread (17 messages) 17 messages, 5 authors, 2024-02-21

Re: [PATCH 0/4] arm64: mm: support dynamic vmalloc/pmd configuration

From: Christophe Leroy <hidden>
Date: 2024-02-21 07:32:12
Also in: bpf, linux-arch, linux-efi, linux-mm, linux-riscv, linux-s390, lkml


Le 20/02/2024 à 21:32, Maxwell Bland a écrit :
[Vous ne recevez pas souvent de courriers de mbland@motorola.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]

Reworks ARM's virtual memory allocation infrastructure to support
dynamic enforcement of page middle directory PXNTable restrictions
rather than only during the initial memory mapping. Runtime enforcement
of this bit prevents write-then-execute attacks, where malicious code is
staged in vmalloc'd data regions, and later the page table is changed to
make this code executable.

Previously the entire region from VMALLOC_START to VMALLOC_END was
vulnerable, but now the vulnerable region is restricted to the 2GB
reserved by module_alloc, a region which is generally read-only and more
difficult to inject staging code into, e.g., data must pass the BPF
verifier. These changes also set the stage for other systems, such as
KVM-level (EL2) changes to mark page tables immutable and code page
verification changes, forging a path toward complete mitigation of
kernel exploits on ARM.

Implementing this required minimal changes to the generic vmalloc
interface in the kernel to allow architecture overrides of some vmalloc
wrapper functions, refactoring vmalloc calls to use a standard interface
in the generic kernel, and passing the address parameter already passed
into PTE allocation to the pte_allocate child function call.

The new arm64 vmalloc wrapper functions ensure vmalloc data is not
allocated into the region reserved for module_alloc. arm64 BPF and
kprobe code also see a two-line-change ensuring their allocations abide
by the segmentation of code from data. Finally, arm64's pmd_populate
function is modified to set the PXNTable bit appropriately.
On powerpc (book3s/32) we have more or less the same although it is not 
directly linked to PMDs: the virtual 4G address space is split in 
segments of 256M. On each segment there's a bit called NX to forbit 
execution. Vmalloc space is allocated in a segment with NX bit set while 
Module spare is allocated in a segment with NX bit unset. We never have 
to override vmalloc wrappers. All consumers of exec memory allocate it 
using module_alloc() while vmalloc() provides non-exec memory.

For modules, all you have to do is select 
ARCH_WANTS_MODULES_DATA_IN_VMALLOC and module data will be allocated 
using vmalloc() hence non-exec memory in our case.

Christophe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help