Re: [PATCH v20 08/25] x86/mm: Introduce _PAGE_COW
From: Yu, Yu-cheng <hidden>
Date: 2021-02-10 20:29:52
Also in:
linux-arch, linux-doc, linux-mm, lkml
On 2/10/2021 11:42 AM, Kees Cook wrote:
On Wed, Feb 10, 2021 at 09:56:46AM -0800, Yu-cheng Yu wrote:quoted
There is essentially no room left in the x86 hardware PTEs on some OSes (not Linux). That left the hardware architects looking for a way to represent a new memory type (shadow stack) within the existing bits. They chose to repurpose a lightly-used state: Write=0, Dirty=1. The reason it's lightly used is that Dirty=1 is normally set by hardware and cannot normally be set by hardware on a Write=0 PTE. Software must normally be involved to create one of these PTEs, so software can simply opt to not create them. In places where Linux normally creates Write=0, Dirty=1, it can use the software-defined _PAGE_COW in place of the hardware _PAGE_DIRTY. In other words, whenever Linux needs to create Write=0, Dirty=1, it instead creates Write=0, Cow=1, except for shadow stack, which is Write=0, Dirty=1. This clearly separates shadow stack from other data, and results in the following: (a) A modified, copy-on-write (COW) page: (Write=0, Cow=1) (b) A R/O page that has been COW'ed: (Write=0, Cow=1) The user page is in a R/O VMA, and get_user_pages() needs a writable copy. The page fault handler creates a copy of the page and sets the new copy's PTE as Write=0 and Cow=1. (c) A shadow stack PTE: (Write=0, Dirty=1) (d) A shared shadow stack PTE: (Write=0, Cow=1) When a shadow stack page is being shared among processes (this happens at fork()), its PTE is made Dirty=0, so the next shadow stack access causes a fault, and the page is duplicated and Dirty=1 is set again. This is the COW equivalent for shadow stack pages, even though it's copy-on-access rather than copy-on-write. (e) A page where the processor observed a Write=1 PTE, started a write, set Dirty=1, but then observed a Write=0 PTE. That's possible today, but will not happen on processors that support shadow stack. Define _PAGE_COW and update pte_*() helpers and apply the same changes to pmd and pud.I still find this commit confusing mostly due to _PAGE_COW being 0 without CET enabled. Shouldn't this just get changed universally? Why should this change depend on CET?
For example, in...
static inline int pte_write(pte_t pte)
{
if (cpu_feature_enabled(X86_FEATURE_SHSTK))
return pte_flags(pte) & (_PAGE_RW | _PAGE_DIRTY);
else
return pte_flags(pte) & _PAGE_RW;
}
There are four cases:
(a) RW=1, Dirty=1 -> writable
(b) RW=1, Dirty=0 -> writable
(c) RW=0, Dirty=0 -> not writable
(d) RW=0, Dirty=1 -> shadow stack, or not-writable if !X86_FEATURE_SHSTK
Case (d) is ture only when shadow stack is enabled, otherwise it is not
writable. With shadow stack feature, the usual dirty, copy-on-write PTE
becomes RW=0, Cow=1.
We can get this changed universally, but all usual dirty, copy-on-write
PTEs need the Dirty/Cow swapping, always. Is that desirable?
--
Yu-cheng
[...]