Thread (33 messages) 33 messages, 5 authors, 2017-08-25

Re: [PATCH 5/6] powerpc/mm: Optimize detection of thread local mm's

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2017-07-24 11:25:50

On Mon, 24 Jul 2017 14:28:02 +1000
Benjamin Herrenschmidt [off-list ref] wrote:
quoted hunk ↗ jump to hunk
Instead of comparing the whole CPU mask every time, let's
keep a counter of how many bits are set in the mask. Thus
testing for a local mm only requires testing if that counter
is 1 and the current CPU bit is set in the mask.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/book3s/64/mmu.h |  3 +++
 arch/powerpc/include/asm/mmu_context.h   |  9 +++++++++
 arch/powerpc/include/asm/tlb.h           | 11 ++++++++++-
 arch/powerpc/mm/mmu_context_book3s64.c   |  2 ++
 4 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index 1a220cdff923..c3b00e8ff791 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -83,6 +83,9 @@ typedef struct {
 	mm_context_id_t id;
 	u16 user_psize;		/* page size index */
 
+	/* Number of bits in the mm_cpumask */
+	atomic_t active_cpus;
+
 	/* NPU NMMU context */
 	struct npu_context *npu_context;
 
diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
index ff1aeb2cd19f..cf8f50cd4030 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -96,6 +96,14 @@ static inline void switch_mm_pgdir(struct task_struct *tsk,
 				   struct mm_struct *mm) { }
 #endif
 
+#ifdef CONFIG_PPC_BOOK3S_64
+static inline void inc_mm_active_cpus(struct mm_struct *mm)
+{
+	atomic_inc(&mm->context.active_cpus);
+}
+#else
+static inline void inc_mm_active_cpus(struct mm_struct *mm) { }
+#endif
This is a bit awkward. Can we just move the entire function to test
cpumask and set / increment into helper functions and define them
together with mm_is_thread_local, so it's all in one place?

The extra atomic does not need to be defined when it's not used either.

Also does it make sense to define it based on NR_CPUS > BITS_PER_LONG?
If it's <= then it should be similar load and compare, no?

Looks like a good optimisation though.

Thanks,
Nick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help