Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements

[RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
[RFC PATCH 1/7] powerpc/64s/radix: optimize TLB range flush barriers · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
[RFC PATCH 2/7] powerpc/64s/radix: Implement _tlbie(l)_va_range flush functions · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
[RFC PATCH 3/7] powerpc/64s/radix: Optimize flush_tlb_range · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
[RFC PATCH 4/7] powerpc/64s/radix: Introduce local single page ceiling for TLB range flush · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
[RFC PATCH 5/7] powerpc/64s/radix: Improve TLB flushing for page table freeing · Nicholas Piggin <npiggin@gmail.com> · 2017-10-31
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements · Anshuman Khandual <hidden> · 2017-11-01
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements · Nicholas Piggin <npiggin@gmail.com> · 2017-11-01
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements · Anshuman Khandual <hidden> · 2017-11-02
Re: [RFC PATCH 0/7] powerpc/64s/radix TLB flush performance improvements · Nicholas Piggin <npiggin@gmail.com> · 2017-11-02

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2017-11-01 13:40:15

On Wed, 1 Nov 2017 17:35:51 +0530
Anshuman Khandual [off-list ref] wrote:

On 10/31/2017 12:14 PM, Nicholas Piggin wrote:

quoted

Here's a random mix of performance improvements for radix TLB flushing
code. The main aims are to reduce the amount of translation that gets
invalidated, and to reduce global flushes where we can do local.

To that end, a parallel kernel compile benchmark using powerpc:tlbie
tracepoint shows a reduction in tlbie instructions from about 290,000
to 80,000, and a reduction in tlbiel instructions from 49,500,000 to
15,000,000. Looks great, but unfortunately does not translate to a
statistically significant performance improvement! The needle on TLB
misses does not move much, I suspect because a lot of the flushing is
done a startup and shutdown, and because a significant cost of TLB
flushing itself is in the barriers.

Does memory barrier initiate a single global invalidation with tlbie ?

I'm not quite sure what you're asking, and I don't know the details
of how the hardware handles it, but from the measurements in patch
1 of the series we can see there is a benefit for both tlbie and
tlbiel of batching them up between barriers.

Thanks,
Nick

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help