Thread (24 messages) 24 messages, 5 authors, 2024-01-30

Re: [PATCH RFC 02/12] mm: add config option and per-NUMA node VMS support

From: Artem Kuzin <hidden>
Date: 2024-01-09 16:57:25
Also in: linux-mm

On 1/3/2024 10:43 PM, Christoph Lameter (Ampere) wrote:
On Thu, 28 Dec 2023, artem.kuzin@huawei.com wrote:
quoted
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -626,7 +628,14 @@ struct mm_struct {
        unsigned long mmap_compat_legacy_base;
#endif
        unsigned long task_size;    /* size of task vm space */
-        pgd_t * pgd;
+#ifndef CONFIG_KERNEL_REPLICATION
+        pgd_t *pgd;
+#else
+        union {
+            pgd_t *pgd;
+            pgd_t *pgd_numa[MAX_NUMNODES];
+        };
+#endif

Hmmm... This is adding the pgd pointers for all mm_structs. But we only need the numa pgs pointers for the init_mm. Can this be a separate variable? There are some architecures with larger number of nodes.

Hi, Christoph.

Sorry for such delay with the reply.

We already have per-NUMA node init_mm, but this is not enough.
We need this array of pointers in the task struct due to the proper pgd (per-NUMA node) should be used for threads of process that occupy more than one NUMA node.
On x86 we have one translation table per-process that contains both kernel and user space part. In case of kernel text and rodata replication enabled, we need to take
into account per-NUMA node kernel text and rodata replicas during the context switch and etc. For example, if particular thread runs a system call, we need to use the
kernel replica that corresponds to the NUMA node the thread running on. At the same time, the process can occupy several NUMA nodes, and the threads running on different
NUMA nodes should observe one user space version, but different kernel versions (per-NUMA node replicas).

But you are right that this place should be optimized. We no need this array for the processes that not expected to work in cross-NUMA node way. Possibly, we
need to implement some "lazy" approach for per-NUMA node translation tables allocation. Current version of kernel replication support is implemented in a way
when we try to do all the things as simple as possible.

Thank you!

Best regards,
Artem
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help