Thread (41 messages) 41 messages, 6 authors, 2022-06-19

Re: [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap

From: Barry Song <hidden>
Date: 2022-06-19 21:56:50
Also in: linux-doc, linux-mm, lkml

On Mon, Jun 20, 2022 at 8:37 AM Yu Zhao [off-list ref] wrote:
On Thu, Jun 16, 2022 at 9:17 PM Yu Zhao [off-list ref] wrote:
quoted
On Thu, Jun 16, 2022 at 9:03 PM Yu Zhao [off-list ref] wrote:
quoted
On Thu, Jun 16, 2022 at 8:01 PM Barry Song [off-list ref] wrote:
quoted
On Fri, Jun 17, 2022 at 1:43 PM Yu Zhao [off-list ref] wrote:
quoted
On Thu, Jun 16, 2022 at 5:29 PM Yu Zhao [off-list ref] wrote:
quoted
On Thu, Jun 16, 2022 at 4:33 PM Barry Song [off-list ref] wrote:
quoted
On Fri, Jun 17, 2022 at 9:56 AM Yu Zhao [off-list ref] wrote:
quoted
On Wed, Jun 8, 2022 at 4:46 PM Barry Song [off-list ref] wrote:
quoted
On Thu, Jun 9, 2022 at 3:52 AM Linus Torvalds
[off-list ref] wrote:
quoted
On Tue, Jun 7, 2022 at 5:43 PM Barry Song [off-list ref] wrote:
quoted
Given we used to have a flush for clear pte young in LRU, right now we are
moving to nop in almost all cases for the flush unless the address becomes
young exactly after look_around and before ptep_clear_flush_young_notify.
It means we are actually dropping flush. So the question is,  were we
overcautious? we actually don't need the flush at all even without mglru?
We stopped flushing the TLB on A bit clears on x86 back in 2014.

See commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case
clear the accessed bit instead of flushing the TLB").
This is true for x86, RISC-V, powerpc and S390. but it is not true for
most platforms.

There was an attempt to do the same thing in arm64:
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793830.html
but arm64 still sent a nosync tlbi and depent on a deferred to dsb :
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1794484.html
Barry, you've already answered your own question.

Without commit 07509e10dcc7 arm64: pgtable: Fix pte_accessible():
   #define pte_accessible(mm, pte)        \
  -       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid_young(pte))
  +       (mm_tlb_flush_pending(mm) ? pte_present(pte) : pte_valid(pte))

You missed all TLB flushes for PTEs that have gone through
ptep_test_and_clear_young() on the reclaim path. But most of the time,
you got away with it, only occasional app crashes:
https://lore.kernel.org/r/CAGsJ_4w6JjuG4rn2P=d974wBOUtXUUnaZKnx+-G6a8_mSROa+Q@mail.gmail.com/ (local)

Why?
Yes. On the arm64 platform, ptep_test_and_clear_young() without flush
can cause random
App to crash.
ptep_test_and_clear_young() + flush won't have this kind of crashes though.
But after applying commit 07509e10dcc7 arm64: pgtable: Fix
pte_accessible(), on arm64,
ptep_test_and_clear_young() without flush won't cause App to crash.

ptep_test_and_clear_young(), with flush, without commit 07509e10dcc7:   OK
ptep_test_and_clear_young(), without flush, with commit 07509e10dcc7:   OK
ptep_test_and_clear_young(), without flush, without commit 07509e10dcc7:   CRASH
I agree -- my question was rhetorical :)

I was trying to imply this logic:
1. We cleared the A-bit in PTEs with ptep_test_and_clear_young()
2. We missed TLB flush for those PTEs on the reclaim path, i.e., case
3 (case 1 & 2 guarantee flushes)
3. We saw crashes, but only occasionally

Assuming TLB cached those PTEs, we would have seen the crashes more
often, which contradicts our observation. So the conclusion is TLB
didn't cache them most of the time, meaning flushing TLB just for the
sake of the A-bit isn't necessary.
quoted
do you think it is safe to totally remove the flush code even for
the original
LRU?
Affirmative, based on not only my words, but 3rd parties':
1. Your (indirect) observation
2. Alexander's benchmark:
https://lore.kernel.org/r/BYAPR12MB271295B398729E07F31082A7CFAA0@BYAPR12MB2712.namprd12.prod.outlook.com/ (local)
3. The fundamental hardware limitation in terms of the TLB scalability
(Fig. 1): https://www.usenix.org/legacy/events/osdi02/tech/full_papers/navarro/navarro.pdf
4. Intel's commit b13b1d2d8692 ("x86/mm: In the PTE swapout page
reclaim case clear the accessed bit instead of flushing the TLB")
Hi Yu,
I am going to send a RFC based on the above discussion.
diff --git a/mm/rmap.c b/mm/rmap.c
index 5bcb334cd6f2..7ce6f0b6c330 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -830,7 +830,7 @@ static bool folio_referenced_one(struct folio *folio,
                }

                if (pvmw.pte) {
-                       if (ptep_clear_flush_young_notify(vma, address,
+                       if (ptep_clear_young_notify(vma, address,
                                                pvmw.pte)) {
                                /*
                                 * Don't treat a reference through
Thanks!

This might make a difference on my 64 core Altra -- I'll test after
you post the RFC.
Also, IIRC, it made no difference on POWER9 because POWER9
flushes TBL regardless which variant is used.
  ^^^^^^^ doesn't flush

I just verified this on POWER9. So on top of exhibit 1-4, we got:
  5. 3cb1aa7aa3940 ("powerpc/64s: Implement ptep_clear_flush_young
that does not flush TLBs")
Thanks, Yu. I put a rfc,
https://lore.kernel.org/lkml/20220617070555.344368-1-21cnbao@gmail.com/ (local)

we may clarify everything in that thread :-)

Thanks
Barry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help