Re: [PATCH] mm: mmu_notifier: fix inconsistent memory between secondary MMU and host

From: Xiao Guangrong <hidden>
Date: 2012-08-22 06:03:53
Also in: kvm, lkml

On 08/21/2012 11:06 PM, Andrea Arcangeli wrote:

On Tue, Aug 21, 2012 at 05:46:39PM +0800, Xiao Guangrong wrote:

quoted

There has a bug in set_pte_at_notify which always set the pte to the
new page before release the old page in secondary MMU, at this time,
the process will access on the new page, but the secondary MMU still
access on the old page, the memory is inconsistent between them

Below scenario shows the bug more clearly:

at the beginning: *p = 0, and p is write-protected by KSM or shared with
parent process

CPU 0                                       CPU 1
write 1 to p to trigger COW,
set_pte_at_notify will be called:
  *pte = new_page + W; /* The W bit of pte is set */

                                     *p = 1; /* pte is valid, so no #PF */

                                     return back to secondary MMU, then
                                     the secondary MMU read p, but get:
                                     *p == 0;

                         /*
                          * !!!!!!
                          * the host has already set p to 1, but the secondary
                          * MMU still get the old value 0
                          */

  call mmu_notifier_change_pte to release
  old page in secondary MMU

The KSM usage of it looks safe because it will only establish readonly
ptes with it.

It seems a problem only for do_wp_page. It wasn't safe to setup
writable ptes with it. I guess we first introduced it for KSM and then
we added it to do_wp_page too by mistake.

The race window is really tiny, it's unlikely it has ever triggered,
however this one seem to be possible so it's slightly more serious
than the other race you recently found (the previous one in the exit
path I think it was impossible to trigger with KVM).

quoted

We can fix it by release old page first, then set the pte to the new
page.

Note, the new page will be firstly used in secondary MMU before it is
mapped into the page table of the process, but this is safe because it
is protected by the page table lock, there is no race to change the pte

Signed-off-by: Xiao Guangrong <redacted>
---
 include/linux/mmu_notifier.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 1d1b1e1..8c7435a 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h

@@ -317,8 +317,8 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 	unsigned long ___address = __address;				\
 	pte_t ___pte = __pte;						\
 									\
-	set_pte_at(___mm, ___address, __ptep, ___pte);			\
 	mmu_notifier_change_pte(___mm, ___address, ___pte);		\
+	set_pte_at(___mm, ___address, __ptep, ___pte);			\
 })

If we establish the spte on the new page, what will happen is the same
race in reverse. The fundamental problem is that the first guy that
writes to the "newpage" (guest or host) won't fault again and so it
will fail to serialize against the PT lock.

CPU0  		    	    	CPU1
				oldpage[1] == 0 (both guest & host)
oldpage[0] = 1
trigger do_wp_page

We always do ptep_clear_flush before set_pte_at_notify(),
at this point, we have done:
  pte = 0 and flush all tlbs

mmu_notifier_change_pte
spte = newpage + writable
				guest does newpage[1] = 1
				vmexit
				host read oldpage[1] == 0

                  It can not happen, at this point pte = 0, host can not
		  access oldpage anymore, host read can generate #PF, it
                  will be blocked on page table lock until CPU 0 release the lock.

pte = newpage + writable (too late)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help