Thread (11 messages) 11 messages, 3 authors, 2017-01-19

[linux-sunxi] [PATCH] clk: sunxi-ng: fix PLL_CPUX adjusting on H3

From: Maxime Ripard <hidden>
Date: 2017-01-16 16:44:21
Also in: linux-clk, lkml

Hi Ondrej,

Sorry for the late reply,

On Mon, Jan 09, 2017 at 03:50:42PM +0100, Ond?ej Jirman wrote:
Dne 9.1.2017 v 10:59 Maxime Ripard napsal(a):
quoted
On Sat, Jan 07, 2017 at 04:49:18PM +0100, Ond?ej Jirman wrote:
quoted
Maxime,

Dne 25.11.2016 v 01:28 megous at megous.com napsal(a):
quoted
From: Ondrej Jirman <redacted>

When adjusting PLL_CPUX on H3, the PLL is temporarily driven
too high, and the system becomes unstable (oopses or hangs).

Add a notifier to avoid this situation by temporarily switching
to a known stable 24 MHz oscillator.
I have done more thorough testing on H3 and this approach with switching
to 24MHz oscillator does not work. Motivation being that my Orange Pi
One still gets lockups even with this patch under certain circumstances.

So I have created a small test program for CPUS (additional OpenRISC CPU
on the SoC) which randomly changes PLL_CPUX settings while main CPU is
running a loop that sends messages to CPUS via msgbox.

Assumption being that while CPUS is successfully receiving messages via
msgbox, the main CPU didn't lock up, yet.

With this I am able to quickly and thoroughly test various PLL_CPUX
change and factor selection algorithms.

Results are that bypassing CPUX clock by switching to 24 MHz oscillator
does not work at all. Main CPU locks up in about 1 second into the test.
Don't ask me why.
You mean that you are changing the frequency behind Linux' back? That
won't work. There's more to cpufreq than just changing the frequency,
but also adusting the number of loops per jiffy for the new frequency
for example. I don't really expect that setup to work even on a
perfectly stable system. CPUFreq *has* to be involved, otherwise, that
alone might introduce bugs, and you cannot draw any conclusions
anymore.
No, this has nothing to do with linux. I'm not running linux for this
test. I'm running a small program on CPUS (Open RISC CPU) on the SoC
loaded using FEL from USB.

The main cpu is just pushing messages into msgbox in a loop, so that
CPUS can determine that the main CPU is still running ok and give
feedback to me over UART. Not even DRAM is involved. The programs are
running from SRAM.

This is the most direct test of PLL change stability that can be done on
this SoC regardless of the OS. Not even CPU voltage switching is
involved. I just set the maximum voltage and fiddle with CPU_PLL
frequencies randomly, while waiting for the main CPU to lock up.
Ok.
It does lock up quickly with mainline ccu_nkmp_find_best algorithm
for finding factors.

Even with linux kernel, it breaks. It's just more difficult to hit the
right conditions. I got oops only right after boot when running cpuburn
to trigger thermal_zone issued OPP change, if I first run some cpupower
commands. That's why I wrote this program to stress test various CPU_PLL
change/factor selection algorithms independently of everything else, to
get more predictable and quicker testing results.
Understood. Do you have the code available somewhere?
quoted
quoted
What works is selecting NKMP factors so that M is always 1 and P is
anything other than /1 only for frequencies under 288MHz. As mandated by
the H3 datasheet. Mainline ccu_nkmp_find_best doesn't respect these
conditions. With that I can change CPUX frequencies randomly 20x a
second so far indefinitely without the main CPU ever locking up.

Please drop or revert this patch. It is not a correct approach to the
problem. I'd suggest dropping the entire clock notifier mechanism, too,
unless it can be proven to work reliably.
It has been proven to work reliably on a number of other SoCs.
Unless it was stress tested like this with randomy changed settings, I
doubt you can call it reliable. It may just be very hard to hit the
issue on linux with particular OPP/thermal zone configuration. That's
because the issue is dependent on before and after NKMP values. People
may have just been lucky so far.
Yes, or maybe we just have OPPs that just don't trigger a low enough P
factor.

There's no rush anyway, the H3 cpufreq support is not enabled at the
moment, so that code basically does nothing for the moment.

What's your current plan to fix that? I guess the easiest (and most
likely to be reusable) would be to allow for clock tables, instead of
using the generic approach. We might have some other clocks (like
audio or video) that would need such a precise tuning in the future
too.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-arm-kernel/attachments/20170116/94186f7d/attachment.sig>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help