--- v16
+++ v21
@@ -7,11 +7,11 @@
The first case is that some processors can start a write but end up seeing
a read-only PTE by the time they get to the Dirty bit, creating a transient
shadow stack PTE. However, this will not occur on processors supporting
-Shadow Stack, therefore we don't need a TLB flush here.
+Shadow Stack, and a TLB flush is not necessary.
-The second case is that when the software, without atomic, tests & replaces
-_PAGE_DIRTY with _PAGE_COW, a transient shadow stack PTE can exist.
-This is prevented with cmpxchg.
+The second case is that when _PAGE_DIRTY is replaced with _PAGE_COW non-
+atomically, a transient shadow stack PTE can be created as a result.
+Thus, prevent that with cmpxchg.
Dave Hansen, Jann Horn, Andy Lutomirski, and Peter Zijlstra provided many
insights to the issue. Jann Horn provided the cmpxchg solution.
@@ -19,39 +19,31 @@
Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
- arch/x86/include/asm/pgtable.h | 52 ++++++++++++++++++++++++++++++++++
- 1 file changed, 52 insertions(+)
+ arch/x86/include/asm/pgtable.h | 36 ++++++++++++++++++++++++++++++++++
+ 1 file changed, 36 insertions(+)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
-index 666c25ab9564..1c84f1ba32b9 100644
+index 7a7f6af01ab5..71cc4abec704 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
-@@ -1226,6 +1226,32 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
+@@ -1280,6 +1280,24 @@ static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm,
static inline void ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep)
{
+ /*
-+ * Some processors can start a write, but end up seeing a read-only
-+ * PTE by the time they get to the Dirty bit. In this case, they
-+ * will set the Dirty bit, leaving a read-only, Dirty PTE which
-+ * looks like a shadow stack PTE.
-+ *
-+ * However, this behavior has been improved and will not occur on
-+ * processors supporting Shadow Stack. Without this guarantee, a
-+ * transition to a non-present PTE and flush the TLB would be
-+ * needed.
-+ *
-+ * When changing a writable PTE to read-only and if the PTE has
-+ * _PAGE_DIRTY set, move that bit to _PAGE_COW so that the PTE is
-+ * not a shadow stack PTE.
++ * If Shadow Stack is enabled, pte_wrprotect() moves _PAGE_DIRTY
++ * to _PAGE_COW (see comments at pte_wrprotect()).
++ * When a thread reads a RW=1, Dirty=0 PTE and before changing it
++ * to RW=0, Dirty=0, another thread could have written to the page
++ * and the PTE is RW=1, Dirty=1 now. Use try_cmpxchg() to detect
++ * PTE changes and update old_pte, then try again.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+ pte_t old_pte, new_pte;
+
++ old_pte = READ_ONCE(*ptep);
+ do {
-+ old_pte = READ_ONCE(*ptep);
+ new_pte = pte_wrprotect(old_pte);
-+
+ } while (!try_cmpxchg(&ptep->pte, &old_pte.pte, new_pte.pte));
+
+ return;
@@ -59,32 +51,24 @@
clear_bit(_PAGE_BIT_RW, (unsigned long *)&ptep->pte);
}
-@@ -1282,6 +1308,32 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
+@@ -1330,6 +1348,24 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm,
static inline void pmdp_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pmd_t *pmdp)
{
+ /*
-+ * Some processors can start a write, but end up seeing a read-only
-+ * PMD by the time they get to the Dirty bit. In this case, they
-+ * will set the Dirty bit, leaving a read-only, Dirty PMD which
-+ * looks like a Shadow Stack PMD.
-+ *
-+ * However, this behavior has been improved and will not occur on
-+ * processors supporting Shadow Stack. Without this guarantee, a
-+ * transition to a non-present PMD and flush the TLB would be
-+ * needed.
-+ *
-+ * When changing a writable PMD to read-only and if the PMD has
-+ * _PAGE_DIRTY set, move that bit to _PAGE_COW so that the PMD is
-+ * not a shadow stack PMD.
++ * If Shadow Stack is enabled, pmd_wrprotect() moves _PAGE_DIRTY
++ * to _PAGE_COW (see comments at pmd_wrprotect()).
++ * When a thread reads a RW=1, Dirty=0 PMD and before changing it
++ * to RW=0, Dirty=0, another thread could have written to the page
++ * and the PMD is RW=1, Dirty=1 now. Use try_cmpxchg() to detect
++ * PMD changes and update old_pmd, then try again.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_SHSTK)) {
+ pmd_t old_pmd, new_pmd;
+
++ old_pmd = READ_ONCE(*pmdp);
+ do {
-+ old_pmd = READ_ONCE(*pmdp);
+ new_pmd = pmd_wrprotect(old_pmd);
-+
+ } while (!try_cmpxchg((pmdval_t *)pmdp, (pmdval_t *)&old_pmd, pmd_val(new_pmd)));
+
+ return;