Re: [PATCH 22/35] x86/mm: Prevent VM_WRITE shadow stacks
From: Dave Hansen <hidden>
Date: 2022-02-11 22:19:58
Also in:
linux-api, linux-arch, linux-mm, lkml
On 1/30/22 13:18, Rick Edgecombe wrote:
Shadow stack accesses are writes from handle_mm_fault() perspective. So to generate the correct PTE, maybe_mkwrite() will rely on the presence of VM_SHADOW_STACK or VM_WRITE in the vma. In future patches, when VM_SHADOW_STACK is actually creatable by userspace, a problem could happen if a user calls mprotect( , , PROT_WRITE) on VM_SHADOW_STACK shadow stack memory. The code would then be confused in the event of shadow stack accesses, and create a writable PTE for a shadow stack access. Then the process would fault in a loop. Prevent this from happening by blocking this kind of memory (VM_WRITE and VM_SHADOW_STACK) from being created, instead of complicating the fault handler logic to handle it. Add an x86 arch_validate_flags() implementation to handle the check. Rename the uapi/asm/mman.h header guard to be able to use it for arch/x86/include/asm/mman.h where the arch_validate_flags() will be.
It would be great if this also said: There is an existing arch_validate_flags() hook for mmap() and mprotect() which allows architectures to reject unwanted ->vm_flags combinations. Add an implementation for x86. That's somewhat implied from what is there already, but making it more clear would be nice. There's a much higher bar to add a new arch hook than to just implement an existing one.
quoted hunk ↗ jump to hunk
diff --git a/arch/x86/include/asm/mman.h b/arch/x86/include/asm/mman.h new file mode 100644 index 000000000000..b44fe31deb3a --- /dev/null +++ b/arch/x86/include/asm/mman.h@@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_MMAN_H +#define _ASM_X86_MMAN_H + +#include <linux/mm.h> +#include <uapi/asm/mman.h> + +#ifdef CONFIG_X86_SHADOW_STACK +static inline bool arch_validate_flags(unsigned long vm_flags) +{ + if ((vm_flags & VM_SHADOW_STACK) && (vm_flags & VM_WRITE)) + return false; + + return true; +}
The design decision here seems to be that VM_SHADOW_STACK is itself a pseudo-VM_WRITE flag. Like you said: "Shadow stack accesses are writes from handle_mm_fault()". Very early on, this series seems to have made the decision that shadow stacks are writable and need lots of write handling behavior, *BUT* shouldn't have VM_WRITE set. As a whole, that seems odd. The alternative would be *requiring* VM_WRITE and VM_SHADOW_STACK be set together. I guess the downside is that pte_mkwrite() would need to be made to work on shadow stack PTEs. That particular design decision was never discussed. I think it has a really big impact on the rest of the series. What do you think? Was it a good idea? Or would the alternative be more complicated than what you have now?