Thread (9 messages) 9 messages, 2 authors, 2022-03-01

Re: [PATCH v2 2/3] arch_topology: obtain cpu capacity using information from CPPC

From: "Rafael J. Wysocki" <rafael@kernel.org>
Date: 2021-08-26 18:33:07
Also in: linux-arm-kernel, lkml

On Thu, Aug 26, 2021 at 7:51 PM Ionela Voinescu [off-list ref] wrote:
Thanks for the review, Rafael!

On Wednesday 25 Aug 2021 at 19:54:26 (+0200), Rafael J. Wysocki wrote:
quoted
On Tue, Aug 24, 2021 at 12:57 PM Ionela Voinescu
[off-list ref] wrote:
quoted
Define topology_init_cpu_capacity_cppc() to use highest performance
values from _CPC objects to obtain and set maximum capacity information
for each CPU. acpi_cppc_processor_probe() is a good point at which to
trigger the initialization of CPU (u-arch) capacity values, as at this
point the highest performance values can be obtained from each CPU's
_CPC objects. Architectures can therefore use this functionality
through arch_init_invariance_cppc().

The performance scale used by CPPC is a unified scale for all CPUs in
the system. Therefore, by obtaining the raw highest performance values
from the _CPC objects, and normalizing them on the [0, 1024] capacity
scale, used by the task scheduler, we obtain the CPU capacity of each
CPU.

While an ACPI Notify(0x85) could alert about a change in the highest
performance value, which should in turn retrigger the CPU capacity
computations, this notification is not currently handled by the ACPI
processor driver. When supported, a call to arch_init_invariance_cppc()
would perform the update.

Signed-off-by: Ionela Voinescu <redacted>
Tested-by: Valentin Schneider <redacted>
Cc: Sudeep Holla <redacted>
---
 drivers/base/arch_topology.c  | 37 +++++++++++++++++++++++++++++++++++
 include/linux/arch_topology.h |  4 ++++
 2 files changed, 41 insertions(+)
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 921312a8d957..358e22cd629e 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -306,6 +306,43 @@ bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
        return !ret;
 }

+#ifdef CONFIG_ACPI_CPPC_LIB
+#include <acpi/cppc_acpi.h>
+
+void topology_init_cpu_capacity_cppc(void)
+{
+       struct cppc_perf_caps perf_caps;
+       int cpu;
+
+       if (likely(acpi_disabled || !acpi_cpc_valid()))
+               return;
+
+       raw_capacity = kcalloc(num_possible_cpus(), sizeof(*raw_capacity),
+                              GFP_KERNEL);
+       if (!raw_capacity)
+               return;
+
+       for_each_possible_cpu(cpu) {
+               if (!cppc_get_perf_caps(cpu, &perf_caps)) {
+                       raw_capacity[cpu] = perf_caps.highest_perf;
From experience, I would advise doing some sanity checking on the
per_caps values before using them here.
cppc_get_perf_caps() already returns -EFAULT if highest_perf is 0, and
I'm not sure if I can make any other assumptions about what a sane
highest_perf value would need to be here.
Well, it cannot be less than lowest_perf or nominal_perf or
guaranteed_perf, for instance.
Did you have anything else in mind for sanity checking?
quoted
Also note that highest_perf may not be sustainable, so would using
highest_perf as raw_capacity[] always work as expected?
Yes, in my opinion using it is better than the alternative, using the
nominal performance value. This highest performance value helps obtain
the maximum capacity of a CPU on a scale [0, 1024] when referenced to
the highest performance of the biggest CPU in the system. There is no
assumption in the task scheduler that this capacity is sustainable.
That's true, but there are consequences if it is the case.  Namely,
you may find that the big CPUs run at the highest performance for a
small fraction of time, so most of the time they may appear to be
underutilized no matter how many tasks are packed on them, which then
will influence the utilization metrics of those tasks.

It may be better to use guaranteed_perf or some value between it at
the highest for this reason.
Using lower values (nominal performance) would shorten the scale and
make smaller CPUs seem bigger than they are. Also, using highest
performance gives a better indication of micro-architectural
differences in performance between CPUs, which plays a role in scaling
utilization, even if some of the performance levels are not sustainable
(which is platform dependent).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help