Thread (41 messages) 41 messages, 4 authors, 2022-01-06

Re: cpufreq: intel_pstate: map utilization into the pstate range

From: "Rafael J. Wysocki" <rafael@kernel.org>
Date: 2021-12-19 14:19:22
Also in: lkml

Possibly related (same subject, not in this thread)

On Sun, Dec 19, 2021 at 7:42 AM Julia Lawall [off-list ref] wrote:


On Sat, 18 Dec 2021, Francisco Jerez wrote:
quoted
Julia Lawall [off-list ref] writes:
quoted
On Sat, 18 Dec 2021, Francisco Jerez wrote:
quoted
Julia Lawall [off-list ref] writes:
quoted
quoted
As you can see in intel_pstate.c, min_pstate is initialized on core
platforms from MSR_PLATFORM_INFO[47:40], which is "Maximum Efficiency
Ratio (R/O)".  However that seems to deviate massively from the most
efficient ratio on your system, which may indicate a firmware bug, some
sort of clock gating problem, or an issue with the way that
intel_pstate.c processes this information.
I'm not sure to understand the bug part.  min_pstate gives the frequency
that I find as the minimum frequency when I look for the specifications of
the CPU.  Should one expect that it should be something different?
I'd expect the minimum frequency on your processor specification to
roughly match the "Maximum Efficiency Ratio (R/O)" value from that MSR,
since there's little reason to claim your processor can be clocked down
to a frequency which is inherently inefficient /and/ slower than the
maximum efficiency ratio -- In fact they both seem to match in your
system, they're just nowhere close to the frequency which is actually
most efficient, which smells like a bug, like your processor
misreporting what the most efficient frequency is, or it deviating from
the expected one due to your CPU static power consumption being greater
than it would be expected to be under ideal conditions -- E.g. due to
some sort of clock gating issue, possibly due to a software bug, or due
to our scheduling of such workloads with a large amount of lightly
loaded threads being unnecessarily inefficient which could also be
preventing most of your CPU cores from ever being clock-gated even
though your processor may be sitting idle for a large fraction of their
runtime.
The original mail has results from two different machines: Intel 6130
(skylake) and Intel 5218 (cascade lake).  I have access to another cluster
of 6130s and 5218s.  I can try them.

I tried 5.9 in which I just commented out the schedutil code to make
frequency requests.  I only tested avrora (tiny pauses) and h2 (longer
pauses) and in both case the execution is almost entirely in the turbo
frequencies.

I'm not sure to understand the term "clock-gated".  What C state does that
correspond to?  The turbostat output for one run of avrora is below.
I didn't have any specific C1+ state in mind, most of the deeper ones
implement some sort of clock gating among other optimizations, I was
just wondering whether some sort of software bug and/or the highly
intermittent CPU utilization pattern of these workloads are preventing
most of your CPU cores from entering deep sleep states.  See below.
quoted
julia

78.062895 sec
Package Core  CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     POLL    C1      C1E     C6      POLL%   C1%     C1E%    C6%     CPU%c1  CPU%c6  CoreTmp PkgTmp  Pkg%pc2 Pkg%pc6 Pkg_J   RAM_J   PKG_%   RAM_%
-     -       -       31      2.95    1065    2096    156134  0       1971    155458  2956270 657130  0.00    0.20    4.78    92.26   14.75   82.31   40      41      45.14   0.04    4747.52 2509.05 0.00    0.00
0     0       0       13      1.15    1132    2095    11360   0       0       2       39      19209   0.00    0.00    0.01    99.01   8.02    90.83   39      41      90.24   0.04    2266.04 1346.09 0.00    0.00
This seems suspicious:                                                                                                                                                          ^^^^    ^^^^^^^

I hadn't understood that you're running this on a dual-socket system
until I looked at these results.
Sorry not to have mentioned that.
quoted
It seems like package #0 is doing
pretty much nothing according to the stats below, but it's still
consuming nearly half of your energy, apparently because the idle
package #0 isn't entering deep sleep states (Pkg%pc6 above is close to
0%).  That could explain your unexpectedly high static power consumption
and the deviation of the real maximum efficiency frequency from the one
reported by your processor, since the reported maximum efficiency ratio
cannot possibly take into account the existence of a second CPU package
with dysfunctional idle management.
Our assumption was that if anything happens on any core, all of the
packages remain in a state that allows them to react in a reasonable
amount of time ot any memory request.
quoted
I'm guessing that if you fully disable one of your CPU packages and
repeat the previous experiment forcing various P-states between 10 and
37 you should get a maximum efficiency ratio closer to the theoretical
one for this CPU?
OK, but that's not really a natural usage context...  I do have a
one-socket Intel 5220.  I'll see what happens there.

I did some experiements with forcing different frequencies.  I haven't
finished processing the results, but I notice that as the frequency goes
up, the utilization (specifically the value of
map_util_perf(sg_cpu->util) at the point of the call to
cpufreq_driver_adjust_perf in sugov_update_single_perf) goes up as well.
Is this expected?
It isn't, as long as the scale-invariance mechanism mentioned in my
previous message works properly.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help