Thread (13 messages) 13 messages, 4 authors, 2021-09-03

Re: [FIX PATCH 2/2] mm/page_alloc: Use accumulated load when building node fallback list

From: Bharata B Rao <hidden>
Date: 2021-09-03 04:44:24
Also in: lkml

On 8/30/2021 5:46 PM, Bharata B Rao wrote:
From: Krupa Ramakrishnan <redacted>

In build_zonelists(), when the fallback list is built for the nodes,
the node load gets reinitialized during each iteration. This results
in nodes with same distances occupying the same slot in different
node fallback lists rather than appearing in the intended round-
robin manner. This results in one node getting picked for allocation
more compared to other nodes with the same distance.

As an example, consider a 4 node system with the following distance
matrix.

Node 0  1  2  3
----------------
0    10 12 32 32
1    12 10 32 32
2    32 32 10 12
3    32 32 12 10

For this case, the node fallback list gets built like this:

Node	Fallback list
---------------------
0	0 1 2 3
1	1 0 3 2
2	2 3 0 1
3	3 2 0 1 <-- Unexpected fallback order
FWIW, for a dual-socket 8 node system with the following distance matrix,

node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  32  32  32  32
  1:  12  10  12  12  32  32  32  32
  2:  12  12  10  12  32  32  32  32
  3:  12  12  12  10  32  32  32  32
  4:  32  32  32  32  10  12  12  12
  5:  32  32  32  32  12  10  12  12
  6:  32  32  32  32  12  12  10  12
  7:  32  32  32  32  12  12  12  10

the fallback list looks like this:

Before
=======
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 0 1 2 3
Fallback order for Node 6: 6 7 4 5 0 1 2 3
Fallback order for Node 7: 7 4 5 6 0 1 2 3

After the fix
==============
Fallback order for Node 0: 0 1 2 3 4 5 6 7
Fallback order for Node 1: 1 2 3 0 5 6 7 4
Fallback order for Node 2: 2 3 0 1 6 7 4 5
Fallback order for Node 3: 3 0 1 2 7 4 5 6
Fallback order for Node 4: 4 5 6 7 0 1 2 3
Fallback order for Node 5: 5 6 7 4 1 2 3 0
Fallback order for Node 6: 6 7 4 5 2 3 0 1
Fallback order for Node 7: 7 4 5 6 3 0 1 2

So the problem becomes more pronounced for bigger NUMA systems.

Regards,
Bharata.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help