Thread (25 messages) 25 messages, 4 authors, 2018-09-05

Re: [PATCH v6 1/2] mm: migration: fix migration of huge PMD shared pages

From: Mike Kravetz <hidden>
Date: 2018-08-30 18:06:51
Also in: linux-mm, linux-rdma, lkml

On 08/30/2018 09:57 AM, Jerome Glisse wrote:
On Thu, Aug 30, 2018 at 06:19:52PM +0200, Michal Hocko wrote:
quoted
On Thu 30-08-18 10:08:25, Jerome Glisse wrote:
quoted
On Thu, Aug 30, 2018 at 12:56:16PM +0200, Michal Hocko wrote:
quoted
On Wed 29-08-18 17:11:07, Jerome Glisse wrote:
quoted
On Wed, Aug 29, 2018 at 08:39:06PM +0200, Michal Hocko wrote:
quoted
On Wed 29-08-18 14:14:25, Jerome Glisse wrote:
quoted
On Wed, Aug 29, 2018 at 10:24:44AM -0700, Mike Kravetz wrote:
[...]
quoted
quoted
What would be the best mmu notifier interface to use where there are no
start/end calls?
Or, is the best solution to add the start/end calls as is done in later
versions of the code?  If that is the suggestion, has there been any change
in invalidate start/end semantics that we should take into account?
start/end would be the one to add, 4.4 seems broken in respect to THP
and mmu notification. Another solution is to fix user of mmu notifier,
they were only a handful back then. For instance properly adjust the
address to match first address covered by pmd or pud and passing down
correct page size to mmu_notifier_invalidate_page() would allow to fix
this easily.

This is ok because user of try_to_unmap_one() replace the pte/pmd/pud
with an invalid one (either poison, migration or swap) inside the
function. So anyone racing would synchronize on those special entry
hence why it is fine to delay mmu_notifier_invalidate_page() to after
dropping the page table lock.

Adding start/end might the solution with less code churn as you would
only need to change try_to_unmap_one().
What about dependencies? 369ea8242c0fb sounds like it needs work for all
notifiers need to be updated as well.
This commit remove mmu_notifier_invalidate_page() hence why everything
need to be updated. But in 4.4 you can get away with just adding start/
end and keep around mmu_notifier_invalidate_page() to minimize disruption.
OK, this is really interesting. I was really worried to change the
semantic of the mmu notifiers in stable kernels because this is really
a hard to review change and high risk for anybody running those old
kernels. If we can keep the mmu_notifier_invalidate_page and wrap them
into the range scope API then this sounds like the best way forward.

So just to make sure we are at the same page. Does this sounds goo for
stable 4.4. backport? Mike's hugetlb pmd shared fixup can be applied on
top. What do you think?
You need to invalidate outside page table lock so before the call to
page_check_address(). For instance like below patch, which also only
do the range invalidation for huge page which would avoid too much of
a behavior change for user of mmu notifier.
Right. I would rather not make this PageHuge special though. So the
fixed version should be.
Why not testing for huge ? Only huge is broken and thus only that
need the extra range invalidation. Doing the double invalidation
for single page is bit overkill.
I am a bit confused, and hope this does not add to any confusion by others.

IIUC, the patch below does not attempt to 'fix' anything.  It is simply
there to add the start/end notifiers to the v4.4 version of this routine
so that a subsequent patch can use them (with modified ranges) to handle
unmapping a shared pmd huge page.  That is the mainline fix which started
this thread.

Since we are only/mostly interested in fixing the shared pmd issue in
4.4, how about just adding the start/end notifiers to the very specific
case where pmd sharing is possible?

I can see the value in trying to back port dependent patches such as this
so that stable releases look more like mainline.  However, I am not sure of
the value in this case as this patch was part of a larger set changing
notifier semantics.

-- 
Mike Kravetz
Also below is bogus you need to add a out_notify: label to avoid
an inbalance in start/end callback.
quoted
From c05849f6789ec36e2ff11adcd8fa6cfb05e870a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?J=C3=A9r=C3=B4me=20Glisse?= <redacted>
Date: Thu, 31 Aug 2017 17:17:27 -0400
Subject: [PATCH] mm/rmap: update to new mmu_notifier semantic v2
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit 369ea8242c0fb5239b4ddf0dc568f694bd244de4 upstrea.

Please note that this patch differs from the mainline because we do not
really replace mmu_notifier_invalidate_page by mmu_notifier_invalidate_range
because that requires changes to most of existing mmu notifiers. We also
do not want to change the semantic of this API in old kernels. Anyway
Jerome has suggested that it should be sufficient to simply wrap
mmu_notifier_invalidate_page by *_invalidate_range_start()/end() to fix
invalidation of larger than pte mappings (e.g. THP/hugetlb pages during
migration). We need this change to handle large (hugetlb/THP) pages
migration properly.

Note that because we can not presume the pmd value or pte value we have
to assume the worst and unconditionaly report an invalidation as
happening.

Changed since v2:
  - try_to_unmap_one() only one call to mmu_notifier_invalidate_range()
  - compute end with PAGE_SIZE << compound_order(page)
  - fix PageHuge() case in try_to_unmap_one()

Signed-off-by: Jérôme Glisse <redacted>
Reviewed-by: Andrea Arcangeli <redacted>
Cc: Dan Williams <redacted>
Cc: Ross Zwisler <redacted>
Cc: Bernhard Held <redacted>
Cc: Adam Borowski <redacted>
Cc: Radim Krčmář <redacted>
Cc: Wanpeng Li <redacted>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <redacted>
Cc: Nadav Amit <redacted>
Cc: Mike Galbraith <redacted>
Cc: Kirill A. Shutemov <redacted>
Cc: axie <redacted>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com> # backport to 4.4
---
 mm/rmap.c | 10 ++++++++++
 1 file changed, 10 insertions(+)
diff --git a/mm/rmap.c b/mm/rmap.c
index 1bceb49aa214..aba994f55d6c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1324,12 +1324,21 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	pte_t pteval;
 	spinlock_t *ptl;
 	int ret = SWAP_AGAIN;
+	unsigned long start = address, end;
 	enum ttu_flags flags = (enum ttu_flags)arg;
 
 	/* munlock has nothing to gain from examining un-locked vmas */
 	if ((flags & TTU_MUNLOCK) && !(vma->vm_flags & VM_LOCKED))
 		goto out;
 
+	/*
+	 * We have to assume the worse case ie pmd for invalidation. Note that
+	 * the page can not be free in this function as call of try_to_unmap()
+	 * must hold a reference on the page.
+	 */
+	end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page)));
+	mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
+
 	pte = page_check_address(page, mm, address, &ptl, 0);
 	if (!pte)
 		goto out;
Instead
quoted
 		goto out_notify;
quoted
@@ -1450,6 +1459,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
 		mmu_notifier_invalidate_page(mm, address);
+out_notify:
quoted
+	mmu_notifier_invalidate_range_end(vma->vm_mm, start, end);
 out:
 	return ret;
 }
 
-- 
2.18.0

-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help