Thread (35 messages) 35 messages, 5 authors, 2017-06-14

Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc

From: Michael Bringmann <hidden>
Date: 2017-05-24 23:55:15
Also in: lkml


On 05/24/2017 06:19 AM, Michael Ellerman wrote:
Michael Bringmann [off-list ref] writes:
quoted
On 05/23/2017 04:49 PM, Reza Arbab wrote:
quoted
On Tue, May 23, 2017 at 03:05:08PM -0500, Michael Bringmann wrote:
quoted
On 05/23/2017 10:52 AM, Reza Arbab wrote:
quoted
On Tue, May 23, 2017 at 10:15:44AM -0500, Michael Bringmann wrote:
quoted
+static void setup_nodes(void)
+{
+    int i, l = 32 /* MAX_NUMNODES */;
+
+    for (i = 0; i < l; i++) {
+        if (!node_possible(i)) {
+            setup_node_data(i, 0, 0);
+            node_set(i, node_possible_map);
+        }
+    }
+}
This seems to be a workaround for 3af229f2071f ("powerpc/numa: Reset node_possible_map to only node_online_map").
They may be related, but that commit is not a replacement.  The above patch ensures that
there are enough of the nodes initialized at startup to allow for memory hot-add into a
node that was not used at boot.  (See 'setup_node_data' function in 'numa.c'.)  That and
recording that the node was initialized.
Is it really necessary to preinitialize these empty nodes using setup_node_data()? When you do memory hotadd into a node that was not used at boot, the node data already gets set up by

add_memory
 add_memory_resource
   hotadd_new_pgdat
     arch_alloc_nodedata <-- allocs the pg_data_t
     ...
     free_area_init_node <-- sets NODE_DATA(nid)->node_id, etc.

Removing setup_node_data() from that loop leaves only the call to node_set(). If 3af229f2071f (which reduces node_possible_map) was reverted, you wouldn't need to do that either.
With or without 3af229f2071f, we would still need to add something, somewhere to add new
bits to the 'node_possible_map'.  That is not being done.
You mustn't add bits to the possible map after boot.

That's its purpose, to tell you what nodes could ever *possibly* exist.
The problem that I have been encountering is that the 'possible map' did *not*
show all of the possible nodes.  Rather, it showed only the nodes that were
assigned memory at boot-up.  If more memory were hot-added to the kernel, it
could be assigned into one of the nodes that were skipped at boot.  However,
nothing was updating the 'node_possible_map' correctly in the kernel memory
code.

Reza pointed out a code change in commit 3af229f2071f that has not made it into
the 4.12 checkout i.e. removing the instruction that reduces the node_possible_map.
This may well be a suitable replacement for the code that I have here, and I
will test it here next.
cheers
Later.

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb@linux.vnet.ibm.com
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help