Re: [PATCH v4 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag

From: Beata Michalska <hidden>
Date: 2021-05-18 16:34:22
Also in: lkml

On Tue, May 18, 2021 at 05:56:20PM +0200, Vincent Guittot wrote:

On Tue, 18 May 2021 at 17:48, Beata Michalska [off-list ref] wrote:

quoted

On Tue, May 18, 2021 at 05:28:11PM +0200, Vincent Guittot wrote:

quoted

On Tue, 18 May 2021 at 17:09, Beata Michalska [off-list ref] wrote:

quoted

On Tue, May 18, 2021 at 04:53:09PM +0200, Vincent Guittot wrote:

quoted

On Tue, 18 May 2021 at 16:27, Beata Michalska [off-list ref] wrote:

quoted

On Tue, May 18, 2021 at 03:39:27PM +0200, Vincent Guittot wrote:

quoted

On Mon, 17 May 2021 at 10:24, Beata Michalska [off-list ref] wrote:

quoted

Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain
topology flag, to distinguish between shed_domains where any CPU
capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where
a full range of CPU capacities is visible to all domain members
(SD_ASYM_CPUCAPACITY_FULL).

I'm not sure about what you want to detect:

Is it a sched_domain level with a full range of cpu capacity, i.e.
with at least 1 min capacity and 1 max capacity ?
or do you want to get at least 1 cpu of each capacity ?

That would be at least one CPU of each available capacity within given domain,
so full -set- of available capacities within a domain.

Would be good to add the precision.

Will do.

quoted

Although I'm not sure if that's the best policy compared to only
getting the range which would be far simpler to implement.
Do you have some topology example ?

An example from second patch from the series:

DIE      [                                ]
MC       [                       ][       ]

CPU       [0] [1] [2] [3] [4] [5]  [6] [7]
Capacity  |.....| |.....| |.....|  |.....|
             L       M       B        B

The one above , which is described in your patchset, works with the range policy

Yeap, but that is just a variation of all the possibilities....

quoted

Where:
 arch_scale_cpu_capacity(L) = 512
 arch_scale_cpu_capacity(M) = 871
 arch_scale_cpu_capacity(B) = 1024

which could also look like:

DIE      [                                        ]
MC       [                       ][               ]

CPU       [0] [1] [2] [3] [4] [5]  [6] [7] [8] [9]
Capacity  |.....| |.....| |.....|  |.....| |.....|
             L       M       B        L       B

I know that that HW guys can come with crazy idea but they would
probably add M  instead of L with B in the 2nd cluster as a boost of
performance at the cost of powering up another "cluster" in which case
the range policy works as well

quoted

Considering only range would mean loosing the 2 (M) CPUs out of sight
for feec in some cases.

Is it realistic ? Considering all the code and complexity added by
patch 2, will we really use it at the end ?

I do completely agree that the first approach was slightly .... blown out of
proportions, but with Peter's idea, the complexity has dropped significantly.
With the range being considered we are back to per domain tracking of available
capacities (min/max), plus additional cycles on comparing capacities.
Unless I fail to see the simplicity of that approach ?

With the range, you just have to keep track of one cpumask for min
capacity and 1 for max capacity (considering that the absolute max
capacity/1024 might not be in the cpumap) instead of tracking all
capacity and manipulating/updating a dynamic link list. Then as soon
as you have 1 cpu of both masks then you are done. As a 1st glance
this seems to be simpler to do.

You would still have to go through all the capacities to find min/max:
so it's either going through all available CPUs twice, or tracking capacities
during the single go-through run. Those masks would also have to be updated to
cover hotplug events when one of the two  might become obsolete.
There is an option being considered to drop updating the list upon every
rebuild of sched domains and that would simplify things even further.
I do not see any big gain with changing the approach, especially that current
one covers all of the cases.
The idea though is a good one so thank you for that.


---
BR
B.

quoted

---
BR
B.

quoted

Regards,
Vincent

quoted

---
BR.
B

quoted

---
BR
B.

quoted

With the distinction between full and partial CPU capacity asymmetry,
brought in by the newly introduced flag, the scope of the original
SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing
behaviour when one is detected on a given sched domain, allowing
misfit migrations within sched domains that do not observe full range
of CPU capacities but still do have members with different capacity
values. It loses though it's meaning when it comes to the lowest CPU
asymmetry sched_domain level per-cpu pointer, which is to be now
denoted by SD_ASYM_CPUCAPACITY_FULL flag.

Signed-off-by: Beata Michalska <redacted>
Reviewed-by: Valentin Schneider <redacted>
---
 include/linux/sched/sd_flags.h | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h
index 34b21e9..57bde66 100644
--- a/include/linux/sched/sd_flags.h
+++ b/include/linux/sched/sd_flags.h

@@ -91,6 +91,16 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD)
 SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)

 /*
+ * Domain members have different CPU capacities spanning all unique CPU
+ * capacity values.
+ *
+ * SHARED_PARENT: Set from the topmost domain down to the first domain where
+ *               all available CPU capacities are visible
+ * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups.
+ */
+SD_FLAG(SD_ASYM_CPUCAPACITY_FULL, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS)
+
+/*
  * Domain members share CPU capacity (i.e. SMT)
  *
  * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share
--

2.7.4

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help