Thread (41 messages) 41 messages, 6 authors, 2022-06-19

Re: [PATCH v11 07/14] mm: multi-gen LRU: exploit locality in rmap

From: Will Deacon <will@kernel.org>
Date: 2022-06-07 10:44:29
Also in: linux-doc, linux-mm, lkml

On Tue, Jun 07, 2022 at 10:37:46AM +1200, Barry Song wrote:
On Tue, Jun 7, 2022 at 10:21 PM Will Deacon [off-list ref] wrote:
quoted
On Tue, Jun 07, 2022 at 07:37:10PM +1200, Barry Song wrote:
quoted
I can't really explain why we are getting a random app/java vm crash in monkey
test by using ptep_test_and_clear_young() only in lru_gen_look_around() on an
armv8-a machine without hardware PTE young support.

Moving to  ptep_clear_flush_young() in look_around can make the random
hang disappear according to zhanyuan(Cc-ed).

On x86, ptep_clear_flush_young() is exactly ptep_test_and_clear_young()
after
 'commit b13b1d2d8692 ("x86/mm: In the PTE swapout page reclaim case clear
the accessed bit instead of flushing the TLB")'

But on arm64, they are different. according to Will's comments in this
thread which
tried to make arm64 same with x86,
https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1793881.html

"
This is blindly copied from x86 and isn't true for us: we don't invalidate
the TLB on context switch. That means our window for keeping the stale
entries around is potentially much bigger and might not be a great idea.

If we roll a TLB invalidation routine without the trailing DSB, what sort of
performance does that get you?
"
We shouldn't think ptep_clear_flush_young() is safe enough in LRU to
clear PTE young? Any comments from Will?
Given that this issue is specific to the multi-gen LRU work, I think Yu is
the best person to comment. However, looking quickly at your analysis above,
I wonder if the code is relying on this sequence:


        ptep_test_and_clear_young(vma, address, ptep);
        ptep_clear_flush_young(vma, address, ptep);


to invalidate the TLB. On arm64, that won't be the case, as the invalidation
in ptep_clear_flush_young() is predicated on the pte being young (and this
patches the generic implementation in mm/pgtable-generic.c. In fact, that
second function call is always going to be a no-op unless the pte became
young again in the middle.
thanks for your reply, sorry for failing to let you understand my question.
my question is actually as below,
right now  lru_gen_look_around() is using ptep_test_and_clear_young()
only without flush to clear pte for a couple of pages including the specific
address:
void lru_gen_look_around(struct page_vma_mapped_walk *pvmw)
{
       ...

       for (i = 0, addr = start; addr != end; i++, addr += PAGE_SIZE) {
               ...

               if (!ptep_test_and_clear_young(pvmw->vma, addr, pte + i))
                       continue;

               ...
}

I wonder if it is safe to arm64. Do we need to move to ptep_clear_flush_young()
in the loop?
I don't know what this code is doing, so Yu is the best person to answer
that. There's nothing inherently dangerous about eliding the TLB
maintenance; it really depends on the guarantees needed by the caller.

However, the snippet you posted from folio_referenced_one():

 |                  if (pvmw.pte) {
 |  +                       if (lru_gen_enabled() && pte_young(*pvmw.pte) &&
 |  +                           !(vma->vm_flags & (VM_SEQ_READ | VM_RAND_READ))) {
 |  +                               lru_gen_look_around(&pvmw);
 |  +                               referenced++;
 |  +                       }
 |  +
 |                          if (ptep_clear_flush_young_notify(vma, address,


Does seem to call lru_gen_look_around() *and*
ptep_clear_flush_young_notify(), which is what prompted my question as it
looks pretty suspicious to me.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help