Re: [PATCH v4 1/3] sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
From: Beata Michalska <hidden>
Date: 2021-05-18 16:34:22
Also in:
lkml
On Tue, May 18, 2021 at 05:56:20PM +0200, Vincent Guittot wrote:
On Tue, 18 May 2021 at 17:48, Beata Michalska [off-list ref] wrote:quoted
On Tue, May 18, 2021 at 05:28:11PM +0200, Vincent Guittot wrote:quoted
On Tue, 18 May 2021 at 17:09, Beata Michalska [off-list ref] wrote:quoted
On Tue, May 18, 2021 at 04:53:09PM +0200, Vincent Guittot wrote:quoted
On Tue, 18 May 2021 at 16:27, Beata Michalska [off-list ref] wrote:quoted
On Tue, May 18, 2021 at 03:39:27PM +0200, Vincent Guittot wrote:quoted
On Mon, 17 May 2021 at 10:24, Beata Michalska [off-list ref] wrote:quoted
Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain topology flag, to distinguish between shed_domains where any CPU capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where a full range of CPU capacities is visible to all domain members (SD_ASYM_CPUCAPACITY_FULL).I'm not sure about what you want to detect: Is it a sched_domain level with a full range of cpu capacity, i.e. with at least 1 min capacity and 1 max capacity ? or do you want to get at least 1 cpu of each capacity ?That would be at least one CPU of each available capacity within given domain, so full -set- of available capacities within a domain.Would be good to add the precision.Will do.quoted
Although I'm not sure if that's the best policy compared to only getting the range which would be far simpler to implement. Do you have some topology example ?An example from second patch from the series: DIE [ ] MC [ ][ ] CPU [0] [1] [2] [3] [4] [5] [6] [7] Capacity |.....| |.....| |.....| |.....| L M B BThe one above , which is described in your patchset, works with the range policyYeap, but that is just a variation of all the possibilities....quoted
quoted
Where: arch_scale_cpu_capacity(L) = 512 arch_scale_cpu_capacity(M) = 871 arch_scale_cpu_capacity(B) = 1024 which could also look like: DIE [ ] MC [ ][ ] CPU [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Capacity |.....| |.....| |.....| |.....| |.....| L M B L BI know that that HW guys can come with crazy idea but they would probably add M instead of L with B in the 2nd cluster as a boost of performance at the cost of powering up another "cluster" in which case the range policy works as wellquoted
Considering only range would mean loosing the 2 (M) CPUs out of sight for feec in some cases.Is it realistic ? Considering all the code and complexity added by patch 2, will we really use it at the end ?I do completely agree that the first approach was slightly .... blown out of proportions, but with Peter's idea, the complexity has dropped significantly. With the range being considered we are back to per domain tracking of available capacities (min/max), plus additional cycles on comparing capacities. Unless I fail to see the simplicity of that approach ?With the range, you just have to keep track of one cpumask for min capacity and 1 for max capacity (considering that the absolute max capacity/1024 might not be in the cpumap) instead of tracking all capacity and manipulating/updating a dynamic link list. Then as soon as you have 1 cpu of both masks then you are done. As a 1st glance this seems to be simpler to do.
You would still have to go through all the capacities to find min/max: so it's either going through all available CPUs twice, or tracking capacities during the single go-through run. Those masks would also have to be updated to cover hotplug events when one of the two might become obsolete. There is an option being considered to drop updating the list upon every rebuild of sched domains and that would simplify things even further. I do not see any big gain with changing the approach, especially that current one covers all of the cases. The idea though is a good one so thank you for that. --- BR B.
quoted
--- BR B.quoted
Regards, Vincentquoted
--- BR. Bquoted
quoted
--- BR B.quoted
quoted
With the distinction between full and partial CPU capacity asymmetry, brought in by the newly introduced flag, the scope of the original SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing behaviour when one is detected on a given sched domain, allowing misfit migrations within sched domains that do not observe full range of CPU capacities but still do have members with different capacity values. It loses though it's meaning when it comes to the lowest CPU asymmetry sched_domain level per-cpu pointer, which is to be now denoted by SD_ASYM_CPUCAPACITY_FULL flag. Signed-off-by: Beata Michalska <redacted> Reviewed-by: Valentin Schneider <redacted> --- include/linux/sched/sd_flags.h | 10 ++++++++++ 1 file changed, 10 insertions(+)diff --git a/include/linux/sched/sd_flags.h b/include/linux/sched/sd_flags.h index 34b21e9..57bde66 100644 --- a/include/linux/sched/sd_flags.h +++ b/include/linux/sched/sd_flags.h@@ -91,6 +91,16 @@ SD_FLAG(SD_WAKE_AFFINE, SDF_SHARED_CHILD) SD_FLAG(SD_ASYM_CPUCAPACITY, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) /* + * Domain members have different CPU capacities spanning all unique CPU + * capacity values. + * + * SHARED_PARENT: Set from the topmost domain down to the first domain where + * all available CPU capacities are visible + * NEEDS_GROUPS: Per-CPU capacity is asymmetric between groups. + */ +SD_FLAG(SD_ASYM_CPUCAPACITY_FULL, SDF_SHARED_PARENT | SDF_NEEDS_GROUPS) + +/* * Domain members share CPU capacity (i.e. SMT) * * SHARED_CHILD: Set from the base domain up until spanned CPUs no longer share --2.7.4