Thread (40 messages) 40 messages, 4 authors, 2021-10-21

RE: [PATCH v2 21/21] Documentation: amd-pstate: add amd-pstate driver introduction

From: "Huang, Ray" <Ray.Huang@amd.com>
Date: 2021-10-14 11:30:42
Also in: lkml

[AMD Official Use Only]
-----Original Message-----
From: Giovanni Gherdovich <redacted>
Sent: Thursday, October 14, 2021 12:23 AM
To: Huang, Ray <Ray.Huang@amd.com>; Rafael J . Wysocki
[off-list ref]; Viresh Kumar [off-list ref];
Shuah Khan [off-list ref]; Borislav Petkov [off-list ref];
Peter Zijlstra [off-list ref]; Ingo Molnar [off-list ref];
linux-pm@vger.kernel.org
Cc: Sharma, Deepak <redacted>; Deucher, Alexander
[off-list ref]; Limonciello, Mario
[off-list ref]; Fontenot, Nathan
[off-list ref]; Su, Jinzhou (Joe) [off-list ref];
Du, Xiaojian [off-list ref]; linux-kernel@vger.kernel.org;
x86@kernel.org
Subject: Re: [PATCH v2 21/21] Documentation: amd-pstate: add amd-pstate
driver introduction

On Sun, 2021-09-26 at 17:06 +0800, Huang Rui wrote:
quoted
Introduce the amd-pstate driver design and implementation.

Signed-off-by: Huang Rui <ray.huang@amd.com>
---
 Documentation/admin-guide/pm/amd_pstate.rst   | 377
++++++++++++++++++
quoted
[... snip ...]
quoted
+
+AMD CPPC Performance Capability
+--------------------------------
+
+Highest Performance (RO)
+.........................
+
+It is the absolute maximum performance an individual processor may
+reach, assuming ideal conditions. This performance level may not be
+sustainable for long durations and may only be achievable if other
+platform components are in a specific state; for example, it may
+require other processors be in an idle state. This would be
+equivalent to the highest frequencies supported by the processor.
+
+Nominal (Guaranteed) Performance (RO)
+......................................
+
+It is the maximum sustained performance level of the processor,
+assuming ideal operating conditions. In absence of an external
+constraint (power, thermal, etc.) this is the performance level the
+processor is expected to be able to maintain continuously. All
+cores/processors are expected to be able to sustain their nominal
performance state simultaneously.
quoted
+
+Lowest non-linear Performance (RO)
+...................................
+
+It is the lowest performance level at which nonlinear power savings
+are achieved, for example, due to the combined effects of voltage and
+frequency scaling. Above this threshold, lower performance levels
+should be generally more energy efficient than higher performance
+levels. This register effectively conveys the most efficient performance
level to ``amd-pstate``.
quoted
+
+Lowest Performance (RO)
+........................
+
+It is the absolute lowest performance level of the processor.
+Selecting a performance level lower than the lowest nonlinear
+performance level may cause an efficiency penalty but should reduce
+the instantaneous power consumption of the processor.
+
Those above are the CPPC capabilities. All good so far. They're Read Only,
and for each capability you have a file in sysfs. It makes sense to describe
them in this Documentation folder ("admin-guide"). But the following
section...
quoted
+AMD CPPC Performance Control
+------------------------------
+
+``amd-pstate`` passes performance goals through these registers. The
+register drives the behavior of the desired performance target.
+
+Minimum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies the minimum allowed performance level.
+
+Maximum requested performance (RW)
+...................................
+
+``amd-pstate`` specifies a limit the maximum performance that is
+expected to be supplied by the hardware.
+
+Desired performance target (RW)
+...................................
+
+``amd-pstate`` specifies a desired target in the CPPC performance
+scale as a relative number. This can be expressed as percentage of
+nominal performance (infrastructure max). Below the nominal sustained
+performance level, desired performance expresses the average
+performance level of the processor subject to hardware. Above the
+nominal performance level, processor must provide at least nominal
+performance requested and go higher if current operating conditions
allow.
quoted
+
+Energy Performance Preference (EPP) (RW)
+.........................................
+
+Provides a hint to the hardware if software wants to bias toward
+performance
+(0x0) or energy efficiency (0xff).
The section above describes the CPPC "performance controls". They're
marked "Read/Write", but you don't expose them to the user via sysfs, am I
right?
Yes. Because we use the kernel governors to manage the "performance controls".
Do I understand correctly that with this driver, the AMD System Management
Unit (SMU -- is it the right name?) is *not* working in autonomous mode, but
is almost entirely under the OS control?

By "autonomous mode" I mean: you run a workload, the driver doesn't select
any desired frequency, and the SMU does its thing and selects the CPU clock
freq on its own. That's not what's happing here, AFAIU. I tried using amd-
pstate using the "userspace" governor (very useful for testing ;), and set
frequencies like

  echo 1200000 >
/sys/devices/system/cpu/cpufreq/policy11/scaling_setspeed

and then, whatever the load on CPU#11, "cpupower monitor" would show
me a constant clock of ~1.2GHz.

Don't get me wrong, this is a very good driver! I'm super happy that the
kernel can finally see all the P-States, instead of just 3.

I'm just trying to clarify that we're using CPPC with autonomous selection
disabled, so I don't think the documentation in admin-guide should describe
features like the R/W "performance controls" that don't make sense in this
context. Especially the "Energy Performance Preference (EPP)", that you
would use to tell the SMU "do what you want, just push a little on the
performance side".
No problem! 😊 Actually, we combine the kernel governor + AMD SMU Arbiter to manage the target frequency with this driver.
Kernel governor such as "schedutil" can predict the workload to calculate most reasonable desired performance value via Linux CPU CFS scheduler.
Then amd-pstate driver can leverage this governor to manage the "performance controls" to SMU CPU clock DPM Arbiter. Because SMU firmware can detect the MSR operations at the same time as well.
At last, the SMU will calculate the final target frequency in the hardware.
I can see that the driver, internally, is sending "lowest nonlinear" as minimum
perf, 255 as maximum perf, and whatever the governor wants as desired perf.
It just isn't exposed in sysfs so there isn't much point in documenting that.
I will add more descriptions in the RST documentation in V3. Thank you for your suggestion!
quoted
[...]
Full MSR Support
-----------------

Some new Zen3 processors such as Cezanne provide the MSR registers
directly while the :c:macro:`X86_FEATURE_AMD_CPPC_EXT` CPU feature
flag is set.
quoted
``amd-pstate`` can handle the MSR register to implement the fast
switch function in ``CPUFreq`` that can shrink latency of frequency
control on the interrupt context.
A-ha! Cezanne. I have an EPYC Milan, so that's probably why I can't get the
"Full MSR Support". I'll test the "Shared Memory Support" then, and report
my data.
Looking forward to your result data. 😊

Thanks,
Ray
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help