Re: [PATCH 0/4] arm64: mm: support dynamic vmalloc/pmd configuration
From: Christophe Leroy <hidden>
Date: 2024-02-21 07:32:12
Also in:
bpf, linux-arch, linux-efi, linux-mm, linux-riscv, linux-s390, lkml
Le 20/02/2024 à 21:32, Maxwell Bland a écrit :
[Vous ne recevez pas souvent de courriers de mbland@motorola.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ] Reworks ARM's virtual memory allocation infrastructure to support dynamic enforcement of page middle directory PXNTable restrictions rather than only during the initial memory mapping. Runtime enforcement of this bit prevents write-then-execute attacks, where malicious code is staged in vmalloc'd data regions, and later the page table is changed to make this code executable. Previously the entire region from VMALLOC_START to VMALLOC_END was vulnerable, but now the vulnerable region is restricted to the 2GB reserved by module_alloc, a region which is generally read-only and more difficult to inject staging code into, e.g., data must pass the BPF verifier. These changes also set the stage for other systems, such as KVM-level (EL2) changes to mark page tables immutable and code page verification changes, forging a path toward complete mitigation of kernel exploits on ARM. Implementing this required minimal changes to the generic vmalloc interface in the kernel to allow architecture overrides of some vmalloc wrapper functions, refactoring vmalloc calls to use a standard interface in the generic kernel, and passing the address parameter already passed into PTE allocation to the pte_allocate child function call. The new arm64 vmalloc wrapper functions ensure vmalloc data is not allocated into the region reserved for module_alloc. arm64 BPF and kprobe code also see a two-line-change ensuring their allocations abide by the segmentation of code from data. Finally, arm64's pmd_populate function is modified to set the PXNTable bit appropriately.
On powerpc (book3s/32) we have more or less the same although it is not directly linked to PMDs: the virtual 4G address space is split in segments of 256M. On each segment there's a bit called NX to forbit execution. Vmalloc space is allocated in a segment with NX bit set while Module spare is allocated in a segment with NX bit unset. We never have to override vmalloc wrappers. All consumers of exec memory allocate it using module_alloc() while vmalloc() provides non-exec memory. For modules, all you have to do is select ARCH_WANTS_MODULES_DATA_IN_VMALLOC and module data will be allocated using vmalloc() hence non-exec memory in our case. Christophe