Thread (24 messages) 24 messages, 5 authors, 2024-01-30

Re: [PATCH RFC 00/12] x86 NUMA-aware kernel replication

From: Artem Kuzin <hidden>
Date: 2024-01-29 07:51:28
Also in: linux-mm

On 1/25/2024 7:30 AM, Garg, Shivank wrote:
Hi Artem,
quoted
Preliminary performance evaluation results:
Processor Intel(R) Xeon(R) CPU E5-2690
2 nodes with 12 CPU cores for each one

fork/1 - Time measurements include only one time of invoking this system call.
         Measurements are made between entering and exiting the system call.

fork/1024 - The system call is invoked in  a loop 1024 times.
            The time between entering a loop and exiting it was measured.

mmap/munmap - A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096)
              was mapped using mmap syscall and unmapped using munmap one.
              Every page is mapped/unmapped per a loop iteration.

mmap/lock - The same as above, but in this case flag MAP_LOCKED was added.

open/close - The /dev/null pseudo-file was opened and closed in a loop 1024 times.
             It was opened and closed once per iteration.

mount - The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time.
        The time between entering and exiting the system call was measured.

kill - A signal handler for SIGUSR1 was setup. Signal was sent to a child process,
       which was created using fork glibc's wrapper. Time between sending and receiving
       SIGUSR1 signal was measured.

Hot caches:

fork-1          2.3%
fork-1024       10.8%
mmap/munmap     0.4%
mmap/lock       4.2%
open/close      3.2%
kill            4%
mount           8.7%

Cold caches:

fork-1          42.7%
fork-1024       17.1%
mmap/munmap     0.4%
mmap/lock       1.5%
open/close      0.4%
kill            26.1%
mount           4.1%
I've conducted some testing on AMD EPYC 7713 64-Core processor (dual socket, 2 NUMA nodes, 64 CPUs on each node) to evaluate the performance with this patchset.
I've implemented the syscall based testcases as suggested in your previous mail. I'm shielding the 2nd NUMA node using isolcpus and nohz_full, and executing the tests on cpus belonging to this node.

Performance Evaluation results (% gain over base kernel 6.5.0-rc5):

Hot caches:
fork-1		1.1%
fork-1024	-3.8%
mmap/munmap	-1.5%
mmap/lock	-4.7%
open/close	-6.8%
kill		3.3%
mount		-13.0%

Cold caches:
fork-1		1.2%
fork-1024 	-7.2%
mmap/munmap 	-1.6%
mmap/lock 	-1.0%
open/close 	4.6%
kill 		-54.2%
mount 		-8.5%

Thanks,
Shivank
Hi Shivank, thank you for performance evaluation, unfortunately we don't have AMD EPYC right now,
I'll try to find a way to perform measurements and clarify why such difference.

We currently trying to make performance evaluation using database related benchmarks.
Will return with the results after clarification.

BR
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help