Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock

[PATCH v5 0/9] Speedup mremap on ppc64 · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 1/9] selftest/mremap_test: Update the test to handle pagesize other than 4K · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 2/9] selftest/mremap_test: Avoid crash with static build · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Aneesh Kumar K.V <hidden> · 2021-04-22
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Nathan Chancellor <nathan@kernel.org> · 2021-05-18
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Aneesh Kumar K.V <hidden> · 2021-05-19
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Nathan Chancellor <nathan@kernel.org> · 2021-05-19
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Peter Xu <peterx@redhat.com> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Aneesh Kumar K.V <hidden> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Peter Xu <peterx@redhat.com> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Aneesh Kumar K.V <hidden> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Aneesh Kumar K.V <hidden> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Peter Xu <peterx@redhat.com> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Zi Yan <ziy@nvidia.com> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Peter Xu <peterx@redhat.com> · 2021-05-20
Re: [PATCH v5 3/9] mm/mremap: Use pmd/pud_poplulate to update page table entries · Kalesh Singh <hidden> · 2021-05-20
[PATCH v5 4/9] powerpc/mm/book3s64: Fix possible build error · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Aneesh Kumar K.V <hidden> · 2021-04-22
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-15
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Andrew Morton <akpm@linux-foundation.org> · 2021-05-15
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-15
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Aneesh Kumar K.V <hidden> · 2021-05-17
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-17
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Aneesh Kumar K.V <hidden> · 2021-05-17
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-17
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Michael Ellerman <mpe@ellerman.id.au> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Segher Boessenkool <hidden> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Segher Boessenkool <hidden> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Segher Boessenkool <hidden> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Michael Ellerman <mpe@ellerman.id.au> · 2021-05-20
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Segher Boessenkool <hidden> · 2021-05-20
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-19
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Michael Ellerman <mpe@ellerman.id.au> · 2021-05-20
Re: [PATCH v5 5/9] powerpc/mm/book3s64: Update tlb flush routines to take a page walk cache flush argument · Guenter Roeck <linux@roeck-us.net> · 2021-05-20
[PATCH v5 6/9] mm/mremap: Use range flush that does TLB and page walk cache flush · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-04-22
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-20
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-20
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Linus Torvalds <torvalds@linux-foundation.org> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Linus Torvalds <torvalds@linux-foundation.org> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Linus Torvalds <torvalds@linux-foundation.org> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-24
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Liam Howlett <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Aneesh Kumar K.V <hidden> · 2021-05-21
Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock · Linus Torvalds <torvalds@linux-foundation.org> · 2021-05-21
[PATCH v5 8/9] mm/mremap: Allow arch runtime override · Aneesh Kumar K.V <hidden> · 2021-04-22
[PATCH v5 9/9] powerpc/mm: Enable move pmd/pud · Aneesh Kumar K.V <hidden> · 2021-04-22
Re: [PATCH v5 9/9] powerpc/mm: Enable move pmd/pud · Andrew Morton <akpm@linux-foundation.org> · 2021-05-11

From: Liam Howlett <hidden>
Date: 2021-05-21 15:25:09
Also in: linux-mm

* Aneesh Kumar K.V [off-list ref] [210521 08:51]:

On 5/21/21 11:43 AM, Linus Torvalds wrote:

quoted

On Thu, May 20, 2021 at 5:03 PM Aneesh Kumar K.V
[off-list ref] wrote:

quoted

On 5/21/21 8:10 AM, Linus Torvalds wrote:

quoted

So mremap does need to flush the TLB before releasing the page table
lock, because that's the lifetime boundary for the page that got
moved.

How will we avoid that happening with
c49dd340180260c6239e453263a9a244da9a7c85 /
2c91bd4a4e2e530582d6fd643ea7b86b27907151 . The commit improves mremap
performance by moving level3/level2 page table entries. When doing so we
are not holding level 4 ptl lock (pte_lock()). But rather we are holding
pmd_lock or pud_lock(). So if we move pages around without holding the
pte lock, won't the above issue happen even if we do a tlb flush with
holding pmd lock/pud lock?

Hmm. Interesting.

Your patch (to flush the TLB after clearing the old location, and
before inserting it into the new one) looks like an "obvious" fix.

But I'm putting that "obvious" in quotes, because I'm now wondering if
it actually fixes anything.

Lookie here:

  - CPU1 does a mremap of a pmd or pud.

     It clears the old pmd/pud, flushes the old TLB range, and then
inserts the pmd/pud at the new location.

  - CPU2 does a page shrinker, which calls try_to_unmap, which calls
try_to_unmap_one.

These are entirely asynchronous, because they have no shared lock. The
mremap uses the pmd lock, the try_to_unmap_one() does the rmap walk,
which does the pte lock.

Now, imagine that the following ordering happens with the two
operations above, and a CPU3 that does accesses:

  - CPU2 follows (and sees) the old page tables in the old location and
the took the pte lock

  - the mremap on CPU1 starts - cleared the old pmd, flushed the tlb,
*and* inserts in the new place.

  - a user thread on CPU3 accesses the new location and fills the TLB
of the *new* address

mremap holds the mmap_sem in write mode as well, doesn't it?  How is the user thread
getting the new location?

quoted

  - only now does CPU2 get to the "pte_get_and_clear()" to remove one page

  - CPU2 does a TLB flush and frees the page

End result:

  - both CPU1 _and_ CPU2 have flushed the TLB.

  - but both flushed the *OLD* address

  - the page is freed

  - CPU3 still has the stale TLB entry pointing to the page that is now
free and might be reused for something else

Am I missing something?

That is a problem. With that it looks like CONFIG_HAVE_MOVE_PMD/PUD is
broken? I don't see an easy way to fix this?

-aneesh

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help