Re: [PATCH RFC 04/12] x86: add support of memory protection for NUMA replicas
From: a00561249@china.huawei.com <hidden>
Date: 2024-01-09 15:53:10
Also in:
linux-mm
Hi Shivank, thanks a lot for the comments and findings, I've fixed build and plan to update the patch set soon. On 1/9/2024 9:46 AM, Garg, Shivank wrote:
quoted hunk ↗ jump to hunk
Hi Artem, I hope this message finds you well. I've encountered a compilation issue when KERNEL_REPLICATION is disabled in the config. ld: vmlinux.o: in function `alloc_insn_page': /home/amd/linux_mainline/arch/x86/kernel/kprobes/core.c:425: undefined reference to `numa_set_memory_rox' ld: vmlinux.o: in function `alloc_new_pack': /home/amd/linux_mainline/kernel/bpf/core.c:873: undefined reference to `numa_set_memory_rox' ld: vmlinux.o: in function `bpf_prog_pack_alloc': /home/amd/linux_mainline/kernel/bpf/core.c:891: undefined reference to `numa_set_memory_rox' ld: vmlinux.o: in function `bpf_trampoline_update': /home/amd/linux_mainline/kernel/bpf/trampoline.c:447: undefined reference to `numa_set_memory_rox' ld: vmlinux.o: in function `bpf_struct_ops_map_update_elem': /home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:515: undefined reference to `numa_set_memory_rox' ld: vmlinux.o:/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:524: more undefined references to `numa_set_memory_rox' follow After some investigation, I've put together a patch that resolves this compilation issues for me.--- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c@@ -2268,6 +2268,15 @@ int numa_set_memory_nonglobal(unsigned long addr, int numpages) return ret; } + +#else + +int numa_set_memory_rox(unsigned long addr, int numpages) +{ + return set_memory_rox(addr, numpages); + +} + #endifAdditionally, I'm interested in evaluating the performance impact of this patchset on AMD processors. Could you please point me the benchmarks that you have used in cover letter? Best Regards, Shivank
Regarding the benchmarks, we used self-implemented test with system calls load for now. We used RedHawk Linux approach as a reference. The "An Overview of Kernel Text Page Replication in RedHawk™ Linux® 6.3" article was used. https://concurrent-rt.com/wp-content/uploads/2020/12/kernel-page-replication.pdf The test is very simple: All measured system calls have been invoked using syscall wrapper from glibc, e.g. #include <sys/syscall.h> /* Definition of SYS_* constants */ #include <unistd.h> long syscall(long number, ...); fork/1 Time measurements include only one time of invoking this system call. Measurements are made between entering and exiting the system call. fork/1024 The system call is invoked in a loop 1024 times. The time between entering a loop and exiting it was measured. mmap/munmap A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096) was mapped using mmap syscall and unmapped using munmap one. Every page is mapped/unmapped per a loop iteration. mmap/lock The same as above, but in this case flag MAP_LOCKED was added. open/close The /dev/null pseudo-file was opened and closed in a loop 1024 times. It was opened and closed once per iteration. mount The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time. The time between entering and exiting the system call was measured. kill A signal handler for SIGUSR1 was setup. Signal was sent to a child process, which was created using fork glibc's wrapper. Time between sending and receiving SIGUSR1 signal was measured. Testing environment: Processor Intel(R) Xeon(R) CPU E5-2690 2 nodes with 12 CPU cores for each one. Best Regards, Artem