Re: [RFC PATCH] powerpc/book3s64/radix: Upgrade va tlbie to PID tlbie if we cross PMD_SIZE
From: Nicholas Piggin <npiggin@gmail.com>
Date: 2021-08-04 05:15:08
Excerpts from Aneesh Kumar K.V's message of August 4, 2021 12:37 am:
With shared mapping, even though we are unmapping a large range, the kernel
will force a TLB flush with ptl lock held to avoid the race mentioned in
commit 1cf35d47712d ("mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts")
This results in the kernel issuing a high number of TLB flushes even for a large
range. This can be improved by making sure the kernel switch to pid based flush if the
kernel is unmapping a 2M range.It would be good to have a bit more description here. In any patch that changes a heuristic like this, I would like to see some justification or reasoning that could be refuted or used as a supporting argument if we ever wanted to change the heuristic later. Ideally with some of the obvious downsides listed as well. This "improves" things here, but what if it hurt things elsewhere, how would we come in later and decide to change it back? THP flushes for example, I think now they'll do PID flushes (if they have to be broadcast, which they will tend to be when khugepaged does them). So now that might increase jitter for THP and cause it to be a loss for more workloads. So where do you notice this? What's the benefit?
quoted hunk ↗ jump to hunk
Signed-off-by: Aneesh Kumar K.V <redacted> --- arch/powerpc/mm/book3s64/radix_tlb.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index aefc100d79a7..21d0f098e43b 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c@@ -1106,7 +1106,7 @@ EXPORT_SYMBOL(radix__flush_tlb_kernel_range); * invalidating a full PID, so it has a far lower threshold to change from * individual page flushes to full-pid flushes. */ -static unsigned long tlb_single_page_flush_ceiling __read_mostly = 33; +static unsigned long tlb_single_page_flush_ceiling __read_mostly = 32; static unsigned long tlb_local_single_page_flush_ceiling __read_mostly = POWER9_TLB_SETS_RADIX * 2; static inline void __radix__flush_tlb_range(struct mm_struct *mm,@@ -1133,7 +1133,7 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm, if (fullmm) flush_pid = true; else if (type == FLUSH_TYPE_GLOBAL) - flush_pid = nr_pages > tlb_single_page_flush_ceiling; + flush_pid = nr_pages >= tlb_single_page_flush_ceiling;
Arguably >= is nicer than > here, but this shouldn't be in the same patch as the value change.
else flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;
And it should change everything to be consistent. Although I'm not sure it's worth changing even though I highly doubt any administrator would be tweaking this. Thanks, Nick
quoted hunk ↗ jump to hunk
/*@@ -1335,7 +1335,7 @@ static void __radix__flush_tlb_range_psize(struct mm_struct *mm, if (fullmm) flush_pid = true; else if (type == FLUSH_TYPE_GLOBAL) - flush_pid = nr_pages > tlb_single_page_flush_ceiling; + flush_pid = nr_pages >= tlb_single_page_flush_ceiling; else flush_pid = nr_pages > tlb_local_single_page_flush_ceiling;@@ -1505,7 +1505,7 @@ void do_h_rpt_invalidate_prt(unsigned long pid, unsigned long lpid, continue; nr_pages = (end - start) >> def->shift; - flush_pid = nr_pages > tlb_single_page_flush_ceiling; + flush_pid = nr_pages >= tlb_single_page_flush_ceiling; /* * If the number of pages spanning the range is above-- 2.31.1