Thread (29 messages) 29 messages, 4 authors, 2021-06-03

Re: [PATCH v5 2/3] sched/topology: Rework CPU capacity asymmetry detection

From: Valentin Schneider <hidden>
Date: 2021-05-24 18:01:14
Also in: lkml

Hi Beata,

On 24/05/21 11:16, Beata Michalska wrote:
Currently the CPU capacity asymmetry detection, performed through
asym_cpu_capacity_level, tries to identify the lowest topology level
at which the highest CPU capacity is being observed, not necessarily
finding the level at which all possible capacity values are visible
to all CPUs, which might be bit problematic for some possible/valid
asymmetric topologies i.e.:

DIE      [                                ]
MC       [                       ][       ]

CPU       [0] [1] [2] [3] [4] [5]  [6] [7]
Capacity  |.....| |.....| |.....|  |.....|
           L	     M       B        B

Where:
 arch_scale_cpu_capacity(L) = 512
 arch_scale_cpu_capacity(M) = 871
 arch_scale_cpu_capacity(B) = 1024

In this particular case, the asymmetric topology level will point
at MC, as all possible CPU masks for that level do cover the CPU
with the highest capacity. It will work just fine for the first
cluster, not so much for the second one though (consider the
find_energy_efficient_cpu which might end up attempting the energy
aware wake-up for a domain that does not see any asymmetry at all)

Rework the way the capacity asymmetry levels are being detected,
allowing to point to the lowest topology level (for a given CPU), where
full set of available CPU capacities is visible to all CPUs within given
domain. As a result, the per-cpu sd_asym_cpucapacity might differ across
the domains. This will have an impact on EAS wake-up placement in a way
that it might see different rage of CPUs to be considered, depending on
the given current and target CPUs.

Additionally, those levels, where any range of asymmetry (not
necessarily full) is being detected will get identified as well.
The selected asymmetric topology level will be denoted by
SD_ASYM_CPUCAPACITY_FULL sched domain flag whereas the 'sub-levels'
would receive the already used SD_ASYM_CPUCAPACITY flag. This allows
maintaining the current behaviour for asymmetric topologies, with
misfit migration operating correctly on lower levels, if applicable,
as any asymmetry is enough to trigger the misfit migration.
The logic there relies on the SD_ASYM_CPUCAPACITY flag and does not
relate to the full asymmetry level denoted by the sd_asym_cpucapacity
pointer.

Detecting the CPU capacity asymmetry is being based on a set of
available CPU capacities for all possible CPUs. This data is being
generated upon init and updated once CPU topology changes are being
detected (through arch_update_cpu_topology). As such, any changes
to identified CPU capacities (like initializing cpufreq) need to be
explicitly advertised by corresponding archs to trigger rebuilding
the data.

This patch also removes the additional -dflags- parameter used when
  ^^^^^^^^^^^^^^^^^^^^^^^
s/^/Also remove/
building sched domains as the asymmetry flags are now being set
directly in sd_init.
Few nits below, but beyond that:

Tested-by: Valentin Schneider <redacted>
Reviewed-by: Valentin Schneider <redacted>
+static inline int
+asym_cpu_capacity_classify(struct sched_domain *sd,
+			   const struct cpumask *cpu_map)
+{
+	int sd_asym_flags = SD_ASYM_CPUCAPACITY | SD_ASYM_CPUCAPACITY_FULL;
+	struct asym_cap_data *entry;
+	int asym_cap_count = 0;
+
+	if (list_is_singular(&asym_cap_list))
+		goto leave;
+
+	list_for_each_entry(entry, &asym_cap_list, link) {
+		if (cpumask_intersects(sched_domain_span(sd), entry->cpu_mask)) {
+			++asym_cap_count;
+		} else {
+			/*
+			 * CPUs with given capacity might be offline
+			 * so make sure this is not the case
+			 */
+			if (cpumask_intersects(entry->cpu_mask, cpu_map)) {
+				sd_asym_flags &= ~SD_ASYM_CPUCAPACITY_FULL;
+				if (asym_cap_count > 1)
+					break;
+			}
Readability nit: That could be made into an else if ().

+		}
+	}
+	WARN_ON_ONCE(!asym_cap_count);
+leave:
+	return asym_cap_count > 1 ? sd_asym_flags : 0;
+}
+
+static void asym_cpu_capacity_scan(void)
+{
+	struct asym_cap_data *entry, *next;
+	int cpu;
+
+	list_for_each_entry(entry, &asym_cap_list, link)
+		cpumask_clear(entry->cpu_mask);
+
+	entry = list_first_entry_or_null(&asym_cap_list,
+					 struct asym_cap_data, link);
+
+	for_each_cpu_and(cpu, cpu_possible_mask,
+			 housekeeping_cpumask(HK_FLAG_DOMAIN)) {
+		unsigned long capacity = arch_scale_cpu_capacity(cpu);
+
+		if (!entry || capacity != entry->capacity)
+			entry = asym_cpu_capacity_get_data(capacity);
+		if (entry)
+			__cpumask_set_cpu(cpu, entry->cpu_mask);
That 'if' is only there in case the alloc within the helper failed, which
is a bit of a shame.

You could pass the CPU to that helper function and have it set the right
bit, or you could even forgo the capacity != entry->capacity check here and
let the helper function do it all.

Yes, that means more asym_cap_list iterations, but that's
O(nr_cpus * nr_caps); a topology rebuild is along the lines of
O(nr_cpus² * nr_topology_levels), so not such a big deal comparatively.
+	}
+
+	list_for_each_entry_safe(entry, next, &asym_cap_list, link) {
+		if (cpumask_empty(entry->cpu_mask)) {
+			list_del(&entry->link);
+			kfree(entry);
+		}
+	}
+}
+
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help