Thread (10 messages) 10 messages, 3 authors, 2021-09-16

Re: [PATCH 1/3] perf: Add macros to specify onchip L2/L3 accesses

From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2021-09-14 10:40:48
Also in: linux-perf-users, lkml

Peter Zijlstra [off-list ref] writes:
On Thu, Sep 09, 2021 at 10:45:54PM +1000, Michael Ellerman wrote:
quoted
quoted
The 'new' composite doesnt have a hops field because the hardware that
nessecitated that change doesn't report it, but we could easily add a
field there.

Suppose we add, mem_hops:3 (would 6 hops be too small?) and the
corresponding PERF_MEM_HOPS_{NA, 0..6}
It's really 7 if we use remote && hop = 0 to mean the first hop.
I don't think we can do that, becaus of backward compat. Currently:

  lvl_num=DRAM, remote=1

denites: "Remote DRAM of any distance". Effectively it would have the new
hops field filled with zeros though, so if you then decode with the hops
field added it suddenly becomes:

 lvl_num=DRAM, remote=1, hops=0

and reads like: "Remote DRAM of 0 hops" which is quite daft. Therefore 0
really must denote a 'N/A'.
Ah yeah, duh, it needs to be backward compatible.
quoted
If we're wanting to use some of the hop levels to represent
intra-chip/package hops then we could possibly use them all on a really
big system.

eg. you could imagine something like:

 L2 | 		        - local L2
 L2 | REMOTE | HOPS_0	- L2 of neighbour core
 L2 | REMOTE | HOPS_1	- L2 of near core on same chip (same 1/2 of chip)
 L2 | REMOTE | HOPS_2	- L2 of far core on same chip (other 1/2 of chip)
 L2 | REMOTE | HOPS_3	- L2 of sibling chip in same package
 L2 | REMOTE | HOPS_4	- L2 on separate package 1 hop away
 L2 | REMOTE | HOPS_5	- L2 on separate package 2 hops away
 L2 | REMOTE | HOPS_6	- L2 on separate package 3 hops away


Whether it's useful to represent all those levels I'm not sure, but it's
probably good if we have the ability.
I'm thinking we ought to keep hops as steps along the NUMA fabric, with
0 hops being the local node. That only gets us:

 L2, remote=0, hops=HOPS_0 -- our L2
 L2, remote=1, hops=HOPS_0 -- L2 on the local node but not ours
 L2, remote=1, hops!=HOPS_0 -- L2 on a remote node
Hmm. I'm not sure about tying it directly to NUMA hops. I worry we're
going to see more and more systems where there's a hierarchy within the
chip/package, in addition to the traditional NUMA hierarchy.

Although then I guess it becomes a question of what exactly is a NUMA
hop, maybe the answer is that on those future systems those
intra-chip/package hops should be represented as NUMA hops.

It's not like we have a hard definition of what a NUMA hop is?
quoted
I guess I'm 50/50 on whether that's enough levels, or whether we want
another bit to allow for future growth.
Right, possibly safer to add one extra bit while we can.... I suppose.
Equally it's not _that_ hard to add another bit later (if there's still
one free), makes the API a little uglier to use, but not the end of the
world.

cheers
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help