Thread (17 messages) 17 messages, 2 authors, 2017-07-12

cpufreq: frequency scaling spec in DT node

From: Mason <hidden>
Date: 2017-06-29 11:41:46
Also in: linux-pm

On 29/06/2017 12:04, Viresh Kumar wrote:
On 29-06-17, 11:48, Mason wrote:
quoted
I have two similar, but slightly different SoCs.

Firmware/bootloader sets the "nominal" CPU frequency to
So nominal here is MAX cpu frequency.
quoted
- 1215 MHz on SoC A
- 1206 MHz on SoC B

On both systems, software can reduce the CPU frequency by
writing an 8-bit integer divider to an MMIO register.

Originally, I wanted to define a small number of operating points,
defined only by the divider value, and compute the actual OPP freq
at init.

For example, use { 1, 2, 3, 5, 9 } for dividers =>
1215, 607.5, 405, 243, 135 on SoC A
1206, 603, 402, 241.2, 134 on Soc B

I'm using the generic cpufreq driver.

Binding for the generic cpufreq driver:
https://www.kernel.org/doc/Documentation/devicetree/bindings/cpufreq/cpufreq-dt.txt

I don't think there's a way to do what I want with the
existing driver, right?
No, you should rather use actual target frequency values.
quoted
It's not a big deal, I can write the actual target frequencies
in the DT.
Right.
quoted
(BTW, the OPPs are more SW than HW desc, right?)
Hmm, I wouldn't say that exactly :)

What OPP contains is mostly defined by hardware, apart from the
frequency values we are talking about. And those are decided by the
boot loaders and they are like hardware to the kernel really. They
define hardware capabilities IOW.

If you want, you can actually try implementing a ->target() type
cpufreq driver instead of ->target_index() and you will be able to
select any frequency you want. But with the above example, what you
can select is Max divided by integer value and so you can have 9
different OPPs and reuse cpufreq-dt.
quoted
But my problem is: what happens if firmware/bootloader is
changed without me knowing, and they change the nominal
frequency?
The kernel doesn't have any authority over what frequencies we are
allowed to use and we depend on the boot loader for that. If someone
changes that, screw him :)
quoted
Because of the rounding, if the nominal freq
is slightly increased, the SoC will start working at
              decreased ?
quoted
*slower* speeds.

For example, if nominal is 1215, and I request 603, I will
actually get 405.
No, you will normally get a frequency >= requested frequency with the
cpufreq governors we have.
quoted
This effect can be seen if I define SoC B OPPs on SoC A:

$ cat scaling_available_frequencies
134000 241200 402000 603000 1206000 
/sys/devices/system/cpu/cpu0/cpufreq$ echo 603000 > scaling_max_freq
Wow. This is not how you request a frequency. What you said here is
that the MAX frequency allowed now is 603000 instead of 1206000. And
because 603000 isn't a valid frequency, we go down to 405000.

So, you should try using the userspace governor and play with
scaling_setspeed sysfs file.
I was trying to "emulate" the behavior of the ondemand governor.
Based on your reaction, I got it wrong...
Here is the actual issue:

I'm on SoC B, where nominal/max freq is expected to be 1206 MHz.
So the OPPs in the DT are:
operating-points = <1206000 0 603000 0 402000 0 241200 0 134000 0>;
*But* FW changed the max freq behind my back, to 1215 MHz.

Here is what happens when I execute:
echo ondemand >scaling_governor
sleep 2
cpuburn-a9 & cpuburn-a9 & cpuburn-a9 & cpuburn-a9
### cpuburn-a9 spins in a tight infinite loop,
### hitting all FUs to raise the CPU temperature

# cpufreq_test.sh
[   69.933874] set_target: index=4
[   69.944799] set_target: index=2
[   69.947988] clk_divider_set_rate: rate=303750000 parent_rate=1215000000 div=4
[   69.955542] set_target: index=4
[   69.958801] clk_divider_set_rate: rate=607500000 parent_rate=1215000000 div=2
[   69.984789] set_target: index=0
[   69.987980] clk_divider_set_rate: rate=121500000 parent_rate=1215000000 div=10
[   71.947597] set_target: index=4
[   71.950996] clk_divider_set_rate: rate=607500000 parent_rate=1215000000 div=2

As you can see, the divider remains stuck at 2, so the SoC
is actually running only at 607.5 MHz (instead of 1215 MHz).

If I fix the OPPs in DT to:
operating-points = <1215000 0 607500 0 405000 0 243000 0 135000 0>;
Then I get the expected behavior:

$ cpufreq_test.sh 
[   32.717930] set_target: index=1
[   32.721131] clk_divider_set_rate: rate=243000000 parent_rate=1215000000 div=5
[   32.731326] set_target: index=4
[   32.734521] clk_divider_set_rate: rate=1215000000 parent_rate=1215000000 div=1
[   32.754556] set_target: index=0
[   32.757738] clk_divider_set_rate: rate=135000000 parent_rate=1215000000 div=9
[   32.765864] set_target: index=4
[   32.769217] clk_divider_set_rate: rate=1215000000 parent_rate=1215000000 div=1
[   33.438811] set_target: index=0
[   33.442001] clk_divider_set_rate: rate=135000000 parent_rate=1215000000 div=9
[   33.450249] set_target: index=4
[   33.453470] clk_divider_set_rate: rate=1215000000 parent_rate=1215000000 div=1
[   33.477888] set_target: index=0
[   33.481067] clk_divider_set_rate: rate=135000000 parent_rate=1215000000 div=9
[   34.714786] set_target: index=4
[   34.718237] clk_divider_set_rate: rate=1215000000 parent_rate=1215000000 div=1

Divider settles at 1 (full speed) to provide maximum
performance for the user-space processes.

My concern is that if I don't check somewhere that the
nominal frequency is as expected in the DT, the CPU might
run slower than expected (max freq cut in half).
quoted
[   60.401883] set_target: index=3
[   60.405118] clk_divider_set_rate: rate=405000000 parent_rate=1215000000 div=3


What can I do against that?

Should I check the nominal frequency in my clk driver?
(I'm not sure reading properties of unrelated nodes is acceptable practice.)
We rely on the boot loader to get these details.

There is one thing you can do to avoid adding OPP entries in the DT.
You can rather add them dynamically with help of: dev_pm_opp_add() and
cpufreq-dt will continue to work with that too.
In what driver should I call these... the clk driver?
(drivers/clk/tegra/cvb.c seems to be doind that)

A problem might arise when I need to do voltage scaling,
though, since I also need to specify voltages, right?
But you should understand how to use the sysfs interface first and
make sure you are doing the right thing.
You're talking about this document, right?
https://www.kernel.org/doc/Documentation/cpu-freq/user-guide.txt

Regards.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help