Re: [RFC][PATCH 05/13] mm/numa: automatically generate node migration order
From: Yang Shi <hidden>
Date: 2021-02-02 17:50:19
Also in:
lkml
On Mon, Feb 1, 2021 at 11:13 AM Dave Hansen [off-list ref] wrote:
On 1/29/21 12:46 PM, Yang Shi wrote: ...quoted
quoted
int next_demotion_node(int node) { - return node_demotion[node]; + /* + * node_demotion[] is updated without excluding + * this function from running. READ_ONCE() avoids + * reading multiple, inconsistent 'node' values + * during an update. + */Don't we need a smp_rmb() here? The single write barrier might be not enough in migration target set. Typically a write barrier should be used in pairs with a read barrier.I don't think we need one, practically. Since there is no locking against node_demotion[] updates, although a smp_rmb() would ensure that this read is up-to-date, it could change freely after the smp_rmb().
Yes, but this should be able to guarantee we see "disable + after" state. Isn't it more preferred?
In other words, smp_rmb() would shrink the window where a "stale" read could occur but would not eliminate it.quoted
quoted
+ return READ_ONCE(node_demotion[node]);Why not consolidate the patch #4 in this patch? The patch #4 just add the definition of node_demotion and the function, then the function is changed in this patch, and the function is not used by anyone between the adding and changing.I really wanted to highlight that the locking scheme and the READ_ONCE() (or lack thereof) was specifically due to how node_demotion[] was being updated. The READ_ONCE() is not, for instance, inherent to the data structure. ...quoted
quoted
+/* + * When memory fills up on a node, memory contents can be + * automatically migrated to another node instead of + * discarded at reclaim. + * + * Establish a "migration path" which will start at nodes + * with CPUs and will follow the priorities used to build the + * page allocator zonelists. + * + * The difference here is that cycles must be avoided. If + * node0 migrates to node1, then neither node1, nor anything + * node1 migrates to can migrate to node0. + * + * This function can run simultaneously with readers of + * node_demotion[]. However, it can not run simultaneously + * with itself. Exclusion is provided by memory hotplug events + * being single-threaded.Maybe an example diagram for the physical topology and how the migration target is generated in the comment seems helpful to understand the code.Sure. Were you thinking of a code comment, or enhanced changelog?
I'd prefer a code comment.
Let's say there's a system with two sockets each with the same three
classes of memory: fast, medium and slow. Each memory class is placed
in its own NUMA node and the CPUs are attached to the fast memory. That
leaves 6 NUMA nodes (0-5):
Socket A: 0, 1, 2
Socket B: 3, 4, 5
The migration path for this configuration path would start on the nodes
with the processors and fast memory, progress through medium and end
with the slow memory:
0 -> 1 -> 2 -> stop
3 -> 4 -> 5 -> stop
This is represented in the node_demotion[] like this:
{ 1, // Node 0 migrates to 1
2, // Node 1 migrates to 2
-1, // Node 2 does not migrate
4, // Node 3 migrates to 1
5, // Node 4 migrates to 2
-1} // Node 5 does not migrate
Is that what you were thinking of?Perfect.
...quoted
quoted
+again: + this_pass = next_pass; + next_pass = NODE_MASK_NONE; + /* + * To avoid cycles in the migration "graph", ensure + * that migration sources are not future targets by + * setting them in 'used_targets'. Do this only + * once per pass so that multiple source nodes can + * share a target node. + * + * 'used_targets' will become unavailable in future + * passes. This limits some opportunities for + * multiple source nodes to share a desintation.s/desination/destinationFixed, thanks.quoted
quoted
+ */ + nodes_or(used_targets, used_targets, this_pass); + for_each_node_mask(node, this_pass) { + int target_node = establish_migrate_target(node, &used_targets); + + if (target_node == NUMA_NO_NODE) + continue; + + /* Visit targets from this pass in the next pass: */ + node_set(target_node, next_pass); + } + /* Is another pass necessary? */ + if (!nodes_empty(next_pass)) + goto again; +} + +void set_migration_target_nodes(void)It seems this function is not called outside migrate.c, so it should be static.Fixed, thanks.