Re: [RFC PATCH 1/2] powerpc/numa: Introduce logical numa id

From: Aneesh Kumar K.V <hidden>
Date: 2020-08-09 14:23:56

On 8/8/20 2:15 AM, Nathan Lynch wrote:

"Aneesh Kumar K.V" [off-list ref] writes:

quoted

On 8/7/20 9:54 AM, Nathan Lynch wrote:

quoted

"Aneesh Kumar K.V" [off-list ref] writes:

quoted

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index e437a9ac4956..6c659aada55b 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c

@@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
   	}
   }
   
+static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  NUMA_NO_NODE};

It's odd to me to use MAX_NUMNODES for this array when it's going to be
indexed not by Linux's logical node IDs but by the platform-provided
domain number, which has no relation to MAX_NUMNODES.


I didn't want to dynamically allocate this. We could fetch
"ibm,max-associativity-domains" to find the size for that. The current
code do assume  firmware group id to not exceed MAX_NUMNODES. Hence kept
the array size to be MAX_NUMNODEs. I do agree that it is confusing. May
be we can do #define MAX_AFFINITY_DOMAIN MAX_NUMNODES?

Well, consider:

- ibm,max-associativity-domains can change at runtime with LPM. This
   doesn't happen in practice yet, but we should probably start thinking
   about how to support that.
- The domain numbering isn't clearly specified to have any particular
   properties such as beginning at zero or a contiguous range.

While the current code likely contains assumptions contrary to these
points, a change such as this is an opportunity to think about whether
those assumptions can be reduced or removed. In particular I think it
would be good to gracefully degrade when the number of NUMA affinity
domains can exceed MAX_NUMNODES. Using the platform-supplied domain
numbers to directly index Linux data structures will make that
impossible.

So, maybe genradix or even xarray wouldn't actually be overengineering
here.

One of the challenges with such a data structure is that we initialize 
the nid_map before the slab is available. This means a memblock based 
allocation and we would end up implementing such a sparse data structure 
ourselves here.

As you mentioned above, since we know that hypervisor as of now limits 
the max affinity domain id below ibm,max-associativity-domains we are 
good with an array-like nid_map we have here. This keeps the code simpler.

This will also allow us to switch to a more sparse data structure as you 
requested here in the future because the main change that is pushed in 
this series is the usage of firmare_group_id_to_nid(). The details of 
the data structure we use to keep track of that mapping are pretty much 
internal to that function.

-aneesh

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help