Re: [RFC PATCH v3 12/24] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW
From: Jann Horn <jannh@google.com>
Date: 2018-08-30 21:47:46
Also in:
linux-api, linux-arch, linux-mm, lkml
On Thu, Aug 30, 2018 at 11:01 PM Jann Horn [off-list ref] wrote:
On Thu, Aug 30, 2018 at 10:57 PM Yu-cheng Yu [off-list ref] wrote:quoted
On Thu, 2018-08-30 at 22:44 +0200, Jann Horn wrote:quoted
On Thu, Aug 30, 2018 at 10:25 PM Yu-cheng Yu [off-list ref] wrote:...quoted
quoted
In the flow you described, if C writes to the overflow page before B gets in with a 'call', the return address is still correct for B. To make an attack, C needs to write again before the TLB flush. I agree that is possible. Assume we have a guard page, can someone in the short window do recursive calls in B, move ssp to the end of the guard page, and trigger the same again? He can simply take the incssp route.I don't understand what you're saying. If the shadow stack is between guard pages, you should never be able to move SSP past that area's guard pages without an appropriate shadow stack token (not even with INCSSP, since that has a maximum range of PAGE_SIZE/2), and therefore, it shouldn't matter whether memory outside that range is incorrectly marked as shadow stack. Am I missing something?INCSSP has a range of 256, but we can do multiple of that. But I realize the key is not to have the transient SHSTK page at all. The guard page is !pte_write() and even we have flaws in ptep_set_wrprotect(), there will not be any transient SHSTK pages. I will add guard pages to both ends. Still thinking how to fix ptep_set_wrprotect().cmpxchg loop? Or is that slow?
Something like this:
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
pte_t pte = READ_ONCE(*ptep), new_pte;
/* ... your comment about not needing a TLB shootdown here ... */
do {
pte = pte_wrprotect(pte);
/* note: relies on _PAGE_DIRTY_HW < _PAGE_DIRTY_SW */
/* dirty direct bit-twiddling; you can probably write
this in a nicer way */
pte.pte |= (pte.pte & _PAGE_DIRTY_HW) >>
_PAGE_BIT_DIRTY_HW << _PAGE_BIT_DIRTY_SW;
pte.pte &= ~_PAGE_DIRTY_HW;
pte = cmpxchg(ptep, pte, new_pte);
} while (pte != new_pte);
}
I think this has the advantage of not generating weird spurious pagefaults.
It's not compatible with Xen PV, but I'm guessing that this whole
feature isn't going to support Xen PV anyway? So you could switch
between two implementations of ptep_set_wrprotect using the pvop
mechanism or so - one for environments that support shadow stacks, one
for all other environments.
Or is there some arcane reason why cmpxchg doesn't work here the way I
think it should?