Thread (134 messages) 134 messages, 5 authors, 2017-10-30

Re: [PATCH 4/7] powerpc: Free up four 64K PTE bits in 64K backed HPTE pages

From: Ram Pai <hidden>
Date: 2017-10-23 23:42:56
Subsystem: kernel selftest framework, the rest · Maintainers: Shuah Khan, Linus Torvalds

On Mon, Oct 23, 2017 at 02:22:44PM +0530, Aneesh Kumar K.V wrote:
Benjamin Herrenschmidt [off-list ref] writes:
quoted
On Fri, 2017-09-08 at 15:44 -0700, Ram Pai wrote:
quoted
The second part of the PTE will hold
(H_PAGE_F_SECOND|H_PAGE_F_GIX) at bit 60,61,62,63.
NOTE: None of the bits in the secondary PTE were not used
by 64k-HPTE backed PTE.
Have you measured the performance impact of this ? The second part of
the PTE being in a different cache line there could be one...
I am also looking at a patch series removing the slot tracking
completely. With randomize address turned off and no swap in guest/host
and making sure we touched most of guest ram, I don't find much impact
in performance when we don't track the slot at all. I will post the
patch series with numbers in a day or two. But my test was

while (5000) {
      mmap(128M)
      touch every page of 2048 pages
      munmap()
}

I could also be the best case in my run because i might have always
found the hash pte slot in the primary. In one measurement with swap on
and address randmization enabled, i did find a 50% impact. But then i
was not able to recreate that again. So could be something i did wrong
in the test setup.

Ram,

Will you be able to get a test run with the above loop?
Yes. results with patch look good; better than w/o patch.


/-----------------------------------------------\
|Itteratn| secs w/ patch	|secs w/o patch |
-------------------------------------------------
|1	 | 45.572621     	| 49.046994	|
|2	 | 46.049545     	| 49.378756	|
|3	 | 46.103657     	| 49.223591	|
|4	 | 46.298903     	| 48.991245	|
|5	 | 46.353202     	| 48.988033	|
|6	 | 45.440878     	| 49.175846	|
|7	 | 46.860373     	| 49.008395	|
|8	 | 46.221390     	| 49.236964	|
|9	 | 45.794993     	| 49.171927	|
|10	 | 46.569491     	| 48.995628	|
|-----------------------------------------------|
|average  | 46.1265053		| 49.1217379    |
\-----------------------------------------------/


The code is as follows:

diff --git a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
index 8d084a2..ef2ad87 100644
--- a/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
+++ b/tools/testing/selftests/powerpc/benchmarks/mmap_bench.c
@@ -10,14 +10,14 @@
 
 #include "utils.h"
 
-#define ITERATIONS 5000000
+#define ITERATIONS 5000
 
 #define MEMSIZE (128 * 1024 * 1024)
 
 int test_mmap(void)
 {
 	struct timespec ts_start, ts_end;
-	unsigned long i = ITERATIONS;
+	unsigned long i = ITERATIONS, j;
 
 	clock_gettime(CLOCK_MONOTONIC, &ts_start);
 
@@ -25,6 +25,10 @@ int test_mmap(void)
 		char *c = mmap(NULL, MEMSIZE, PROT_READ|PROT_WRITE,
 			       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
 		FAIL_IF(c == MAP_FAILED);
+
+		for (j=0; j < (MEMSIZE >> 16); j++)
+			c[j<<16] = 0xf;
+
 		munmap(c, MEMSIZE);
 	}
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help