Thread (24 messages) 24 messages, 5 authors, 2024-01-30

Re: [PATCH RFC 04/12] x86: add support of memory protection for NUMA replicas

From: a00561249@china.huawei.com <hidden>
Date: 2024-01-09 15:53:10
Also in: linux-mm

Hi Shivank,
thanks a lot for the comments and findings, I've fixed build and plan to update the patch set soon.

On 1/9/2024 9:46 AM, Garg, Shivank wrote:
quoted hunk ↗ jump to hunk
Hi Artem,

I hope this message finds you well.
I've encountered a compilation issue when KERNEL_REPLICATION is disabled in the config.

ld: vmlinux.o: in function `alloc_insn_page':
/home/amd/linux_mainline/arch/x86/kernel/kprobes/core.c:425: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `alloc_new_pack':
/home/amd/linux_mainline/kernel/bpf/core.c:873: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_prog_pack_alloc':
/home/amd/linux_mainline/kernel/bpf/core.c:891: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_trampoline_update':
/home/amd/linux_mainline/kernel/bpf/trampoline.c:447: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o: in function `bpf_struct_ops_map_update_elem':
/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:515: undefined reference to `numa_set_memory_rox'
ld: vmlinux.o:/home/amd/linux_mainline/kernel/bpf/bpf_struct_ops.c:524: more undefined references to `numa_set_memory_rox' follow


After some investigation, I've put together a patch that resolves this compilation issues for me.
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -2268,6 +2268,15 @@ int numa_set_memory_nonglobal(unsigned long addr, int numpages)

        return ret;
 }
+
+#else
+
+int numa_set_memory_rox(unsigned long addr, int numpages)
+{
+       return set_memory_rox(addr, numpages);
+
+}
+
 #endif
Additionally, I'm interested in evaluating the performance impact of this patchset on AMD processors.
Could you please point me the benchmarks that you have used in cover letter?

Best Regards,
Shivank
Regarding the benchmarks, we used self-implemented test with system calls load for now.
We used RedHawk Linux approach as a reference.

The "An Overview of Kernel Text Page Replication in RedHawk™ Linux® 6.3" article was used.
https://concurrent-rt.com/wp-content/uploads/2020/12/kernel-page-replication.pdf

The test is very simple:
All measured system calls have been invoked using syscall wrapper from glibc, e.g.

#include <sys/syscall.h>      /* Definition of SYS_* constants */
#include <unistd.h>
 
long syscall(long number, ...);

fork/1
    Time measurements include only one time of invoking this system call. Measurements are made between entering
    and exiting the system call.
fork/1024
    The system call is invoked in  a loop 1024 times. The time between entering a loop and exiting it was measured.
mmap/munmap
    A set of 1024 pages (if PAGE_SIZE is not defined it is equal to 4096) was mapped using mmap syscall
    and unmapped using munmap one. Every page is mapped/unmapped per a loop iteration.
mmap/lock
    The same as above, but in this case flag MAP_LOCKED was added.
open/close
    The /dev/null pseudo-file was opened and closed in a loop 1024 times. It was opened and closed once per iteration.
mount
    The pseudo-filesystem procFS was mounted to a temporary directory inside /tmp only one time.
    The time between entering and exiting the system call was measured.
kill
    A signal handler for SIGUSR1 was setup. Signal was sent to a child process, which was created using fork glibc's wrapper.
    Time between sending and receiving SIGUSR1 signal was measured.

Testing environment:
    Processor Intel(R) Xeon(R) CPU E5-2690
    2 nodes with 12 CPU cores for each one.

Best Regards,
Artem
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help