Thread (36 messages) 36 messages, 4 authors, 2020-04-10

Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA

From: Mike Rapoport <hidden>
Date: 2020-03-30 17:51:18
Also in: linux-arm-kernel, linux-mm, linux-s390, lkml, sparclinux
Subsystem: memory management, memory management - page allocator, the rest · Maintainers: Andrew Morton, Vlastimil Babka, Linus Torvalds

On Mon, Mar 30, 2020 at 09:42:46AM +0200, Michal Hocko wrote:
On Sat 28-03-20 11:31:17, Hoan Tran wrote:
quoted
In NUMA layout which nodes have memory ranges that span across other nodes,
the mm driver can detect the memory node id incorrectly.

For example, with layout below
Node 0 address: 0000 xxxx 0000 xxxx
Node 1 address: xxxx 1111 xxxx 1111

Note:
 - Memory from low to high
 - 0/1: Node id
 - x: Invalid memory of a node

When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES
config, mm only checks the memory validity but not the node id.
Because of that, Node 1 also detects the memory from node 0 as below
when it scans from the start address to the end address of node 1.

Node 0 address: 0000 xxxx xxxx xxxx
Node 1 address: xxxx 1111 1111 1111

This layout could occur on any architecture. Most of them enables
this config by default with CONFIG_NUMA. This patch, by default, enables
CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA.
I am not opposed to this at all. It reduces the config space and that is
a good thing on its own. The history has shown that meory layout might
be really wild wrt NUMA. The config is only used for early_pfn_in_nid
which is clearly an overkill.

Your description doesn't really explain why this is safe though. The
history of this config is somehow messy, though. Mike has tried
to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent
NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd
("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any
reasoning what so ever. This doesn't make it really easy see whether
reasons for reintroduction are still there. Maybe there are some subtle
dependencies. I do not see any TBH but that might be burried deep in an
arch specific code.
I've looked at this a bit more and it seems that the check for
early_pfn_in_nid() in memmap_init_zone() can be simply removed.

The commits you've mentioned were way before the addition of
HAVE_MEMBLOCK_NODE_MAP and the whole infrastructure that calculates zone
sizes and boundaries based on the memblock node map.
So, the memmap_init_zone() is called when zone boundaries are already
within a node.

I don't have access to machines with memory layout that required this check
at the first place, so if anybody who does could test the change below on
such machine it would be great.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..6d3eb0901864 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5908,10 +5908,6 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
                                pfn = next_pfn(pfn);
                                continue;
                        }
-                       if (!early_pfn_in_nid(pfn, nid)) {
-                               pfn++;
-                               continue;
-                       }
                        if (overlap_memmap_init(zone, &pfn))
                                continue;
                        if (defer_init(nid, pfn, end_pfn))

 
quoted
v3:
 * Revise the patch description

V2:
 * Revise the patch description

Hoan Tran (5):
  mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA
  powerpc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  x86: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  sparc: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES
  s390: Kconfig: Remove CONFIG_NODES_SPAN_OTHER_NODES

 arch/powerpc/Kconfig | 9 ---------
 arch/s390/Kconfig    | 8 --------
 arch/sparc/Kconfig   | 9 ---------
 arch/x86/Kconfig     | 9 ---------
 mm/page_alloc.c      | 2 +-
 5 files changed, 1 insertion(+), 36 deletions(-)

-- 
1.8.3.1
-- 
Michal Hocko
SUSE Labs
-- 
Sincerely yours,
Mike.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help