Thread (20 messages) 20 messages, 4 authors, 2021-07-21

RE: About add an A64FX cache control function into resctrl

From: tan.shaopeng@fujitsu.com <hidden>
Date: 2021-05-25 08:52:59
Also in: lkml

Hi Reinette,

Sorry, I have not explained A64FX's sector cache function well yet.
I think I need explain this function from different perspective.
On 5/17/2021 1:31 AM, tan.shaopeng@fujitsu.com wrote:
quoted
Hi Reinette,

I’m sorry for the late reply.
I think I could not explain A64FX’s sector cache function well in my
first mail. While answering the question, I will also explain this
function in more detail. Though maybe you have already learned more
about this function by reading specification and manual, in order to
better understand this function, some contents may have duplicate
explanations.
quoted
quoted
quoted
The overview in section 12 was informative but very high level.
I'm considering how to answer your questions from your email which I
received before, when I check the email again, I am sorry that the
information I provided before are insufficient.

To understand the sector cache function of A64FX, could you please
see A64FX_Microarchitecture_Manual - section 12. Sector Cache
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Microarchitect
quoted
quoted
u
quoted
re_Manual_en_1.4.pdf
and,
A64FX_Specification_HPC_Extension ? section 1.2. Sector Cache
https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Specification_
quoted
quoted
H
quoted
PC_Extension_v1_EN.pdf
Thank you for the direct links - I missed that there are two documents
available.
quoted
quoted
After reading the spec portion it does seem to me even more as though
"sectors" could be considered the same as the resctrl "classes of
service". The Fujitsu hardware supports four sectors that can be
configured with different number of ways using the registers you
mention above. In resctrl this could be considered as hardware that
supports four classes of service and each class of service can be allocated
a different number of ways.
quoted
Fujitsu hardware supports four sectors that can be configured with
different number of ways by using "IMP_SCCR" registers, and when this
function is added into resctrl, the maximum ways of each sector are
indicated by bitmap.

However, A64FX's L2 cache setting registers are shared among PEs
(Processor Element) in NUMA. If two PEs in the same NUMA are assigned
to different resource groups, changing one PE's L2 setting on one
resource group, the other PE's L2 setting on other resource groups
will be influenced. So, adding this function into resctrl, we will
assign NUMA to the resource group. (On F64FX, each NUMA has 12 PEs,
and each PE has L1 cache setting registers, but these registers are
not shared.) There are 4 NUMAs on A64FX, 4 NUMAs could be considered
as hardware that supports four classes of service at most, and each
class of service has 4 sectors (4 L1 sectors& 4 L2 sectors), and each
sector can be allocated a different number of ways.
And, when a running task on resource group, the [56:57] bits of
virtual address are used for sector selection (cache affinity).
It is not clear to me why NUMA needs to be involved.

Processors sharing a cache, either L2 or L3 cache, is familiar and well
supported by resctrl.

My understanding of the sector cache feature is that each cache can be split
into multiple (4) sectors. It thus seems to me something specific to the cache
itself.

Let me try and give an example of my understanding based on the cache
architecture described in the A64FX Microarchitecture Manual.

I see in Figure 9-2 that each processor has an L1D as well as L1I Cache, and
twelve processors share an L2 cache. The L1D cache has 4 ways (0xF
bitmask) and L2 cache has 16 (0xFFFF bitmask) ways. From what I understand
the sector cache function is supported on L1D and L2.

First, the goal would be to discover all the caches on the system - since it is the
sectors need to be programmed on each cache. On the system with 48 cores
there would thus be 48 L1D caches, and 4 L2 caches.

Let's start by assigning the caches IDs: the L1D caches are numbered from 0 to
47 and the L2 caches numbered from 0 to 3.

My understanding is that the goal is to program these sectors using resctrl.
Each cache instance can have maximum four sectors, they cannot overlap. (I do
not know if each sector has to have some portion of cache associated with it or
if a sector is allowed to be "empty").

So, what is needed is, for example, to have a way to say: "sector 0 on cache L1D
with id X is assigned Y ways", "sector 1 on cache L2 with id Z is assigned XX
ways". Is this correct?

If my understanding is correct then you can do this with resctrl as follows (I am
making many assumptions on behavior here, especially regarding how many
ways a sector is required to have, but I hope this could be a baseline to evaluate
and correct my understanding and build on how this could be supported):

On boot all cache ways on all cache instances belong to sector 0:

# cd /sys/fs/resctrl/
# cat schemata
L1D:0=0xf;1=0xf;2=0xf;.....;47=0xf
L2:0=0xffff;1=0xffff;2=0xffff;3=0xffff

Create sector2 and assign half of all cache ways to it:
(In support of this it would be required that resctrl resource groups are
exclusive. Exclusive resource groups are already supported but not the default
as it needed here.)

First, to provide cache ways to sector 1, the cache ways needs to be removed
from sector 0:
(I am not sure if specific ways can be assigned to a sector or just a number of
ways, both could be supported) # echo 'L1D:0=0x3;1=0x3;...;47=0x3' >
/sys/fs/resctrl/schemata # echo 'L2:0=0xff;1=0xff;2=0xff;3=0xff'>
/sys/fs/resctrl/schemata

Now create sector2 (alternatively all sectors could exist on boot for this
system):
# mkdir /sys/fs/resctrl/sector2
# echo 'L1D:0=0x3;1=0x3;...;47=0x3' > /sys/fs/resctrl/sector2/schemata #
echo 'L2:0=0xff;1=0xff;2=0xff;3=0xff'> /sys/fs/resctrl/sector2/schemata

At this point there are two sectors configured. Configuration of sector0 can be
found in /sys/fs/resctrl/schemata and configuration of sector1 in
/sys/fs/resctrl/sector1/schemata
quoted
quoted
The other part is how hardware knows which sector is being used at
any moment in time. In resctrl that is programmed by writing the
active class of service into needed register at the time the
application is context switched (resctrl_sched_in()). This seems
different here since as you describe the sector is chosen by bits in
the address. Even so, which bits to set in the address needs to be
programmed also and I also understand that there is a "default"
sector that can be programmed via register. Could these be equivalent to
what is done currently in resctrl?
quoted
Adding this function into resctrl, there is no need to write active
class of service into needed register. When running a task, the sector
id is decided by [56:57] bits of virtual address, and these bits are
programed by users. When creating a resource group, the maximum number
of ways of each sector are set by "IMP_SCCR" setting registers.
As long as the task is running in a certain resource group, the sector
and the maximum number of ways of sectors are used will not be changed.
Therefore, we need not consider context switches on A64FX.
The current interface would associate a "tasks" file with each sector to indicate
which tasks run with the particular sector id. I thought there was a way to
program the default sector id in a register, which is something that could be
done when a task is context switched in.
Otherwise there would need to be some re-architecting to remove the "tasks"
association. This would be a significant change.
--------
A64FX NUMA-PE-Cache Architecture:
NUMA0:
  PE0:
    L1sector0,L1sector1,L1sector2,L1sector3
  PE1:
    L1sector0,L1sector1,L1sector2,L1sector3
  ...
  PE11:
    L1sector0,L1sector1,L1sector2,L1sector3
  
  L2sector0,1/L2sector2,3
NUMA1:
  PE0:
    L1sector0,L1sector1,L1sector2,L1sector3
  ...
  PE11:
    L1sector0,L1sector1,L1sector2,L1sector3
  
  L2sector0,1/L2sector2,3
NUMA2:
  ...
NUMA3:
  ...
--------
In A64FX processor, one L1 sector cache capacity setting register is 
only for one PE and not shared among PEs. L2 sector cache maximum 
capacity setting registers are shared among PEs in same NUMA, and it is 
to be noted that changing these registers in one PE influences other PE. 
The number of ways for L2 Sector ID (0,1 or 2,3) can be set through 
any PEs in same NUMA. The sector ID 0,1 and 2,3 are not available at 
the same time in same NUMA.


I think, in your idea, a resource group will be created for each sector ID.
(> "sectors" could be considered the same as the resctrl "classes of service")
Then, an example of resource group is created as follows.
・ L1: NUMAX-PEY-L1sector0 (X = 0,1,2,3.Y = 0,1,2 ... 11),
・ L2: NUMAX-L2sector0 (X = 0,1,2,3)

In this example, sector with same ID(0) of all PEs is allocated to 
resource group. The L1D caches are numbered from NUMA0_PE0-L1sector0(0)
to NUMA4_PE11-L1sector0(47) and the L2 caches numbered from 
NUMA0-L2sector0(0) to NUM4-L2sector0(3). 
(NUMA number X is from 0-4, PE number Y is from 0-11)
(1) The number of ways of NUMAX-PEY-L1sector0 can be set independently
    for each PEs (0-47). When run a task on this resource group, 
    we cannot control on which PE the task is running on and how many 
    cache ways the task is using. 
(2) Since L2 can only use 2 sectors at a time, when creating more than
    2 resource groups, L2setctor0 will have to be allocated to a 
    different resource group. If the L2sector0 is shared by different 
    resource groups, the L2 sector settings on resource group will be 
    influenced by each other.
etc... there are various problems, and no merit to using resctrl.


In my idea, in order to allocate the L1 and L2 cache to a resource 
group, allocate NUMA to the resource group.
An example of resource group is as follows.
・ NUMA0-PEY-L1sectorZ (Y = 0,1,2...11. Z = 0,1,2,3)
・ NUMA0-L2sectorZZ (ZZ = 0,1,2,3)

  #cat /sys/fs/resctrl/p0/cpus
  0-11 *1
  #cat /sys/fs/resctrl/p0/schemata
  L1:0=0xF,0x3,0x1,x0x0 *2
  L2:0=0xFFF,0xF,0,0 *3

*1: PEs belong one NUMA. (Of course, multiple NUMAs can also be 
    specified in one resource group)
*2: The number of ways for L1sector0,1,2,3. On this resource group 
    the number of ways of all sector0 is the same(0xF). If 0 way is 
    specified for one sector, this sector cannot be used. If 4(0xF) 
    ways are specified for one sector, this sector can use cache fully.
    If 4 ways are specified for each sector, there will be no 
    restriction for using cache.
*3: The number of ways for L2 sector 0,1. If L2sector0,1 is used, 
    the number of ways of L2sector2,3 must be set to 0.

All sectors with the same ID on the same resource group were set to 
the same number of ways, and when running a task on A64FX, the sector 
ID used by task is determined by [56:57] bits of virtual address. 
By specifying the PID to /sys/fs/resctrl/tasks, the task will be bound 
to the resource group, and then, the cache size used by task will not 
be changed never.


Best regards,
Tan Shaopeng

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help