Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and... | linux-devicetree

Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate

From: Marcus Weseloh <hidden>
Date: 2016-01-18 09:41:09
Also in: linux-arm-kernel, linux-spi, lkml

Hi,

2016-01-17 19:51 GMT+01:00 Maxime Ripard [off-list ref]:

On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:

quoted

-     /* Ensure that we have a parent clock fast enough */
+     /*
+      * Ensure that the parent clock is set to twice the max speed
+      * of the spi device (possibly rounded up by the clk driver)
+      */
      mclk_rate = clk_get_rate(sspi->mclk);
-     if (mclk_rate < (2 * tfr->speed_hz)) {
-             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
+     if (spi->max_speed_hz != sspi->cur_max_speed ||
+         mclk_rate != sspi->cur_mclk) {

Do you need to cache the values? As far as I'm aware, you end up doing
the computation all the time anyway.

By caching the values we optimize the case when a single SPI slave
device (or even multiple slave devices with the same max_speed) are
used multiple times in a row. In that case, we're saving two calls:
clk_set_rate and clk_get_rate. I was unsure about how expensive the
clk_* calls were, so I thought it would be safer use caching. But
maybe it's not worth the extra code?

Unless you can point that there's a significant performance
difference, I'm not sure it's worth it.

I did actually notice a significant transfer latency when a new mod0
clock frequency is set, probably due to the __delay in
drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
caching is worth it... at least for the case when there are two slave
devices with different transfer speeds accessing the same SPI module.

I'm not sure we even need that delay in the first place, at least not
for all the clocks using factor.

AFAIK, the only case where it's useful is for PLL, and all of them
have a bit you can busy-wait on that will let you know when the clock
has stabilized.

OK, I didn't know that. Does that mean you would like me to drop the
caching from this patch and prepare a patch to remove the __delay from
clk-factors?

quoted

[...]

quoted

-     div = mclk_rate / (2 * tfr->speed_hz);
-     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
-             if (div > 0)
-                     div--;
-
+     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;

Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?

It is quite often, but not in all cases. The plain division rounds to
the nearest integer, so it rounds down sometimes. Consider the
following case: we have a slow SPI device with a spi-max-frequency of
50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
sets mclk to 214,285Hz.

Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
1 as the old code subtracts 1 two lines further down
The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
3, so div = 2 if we add the -1

Except that you have that extra - 1 after your DIV_ROUND_UP
calculation in the line you add.  so you have to remove 1 from that
line above, and then 1 again when we set the register, which ends up
being the exact same thing, or am I missing something?

The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
the register. There's no need for another -1 after that (and there
isn't one in the code).

I was probably hallucinating :) My bad.

That being said, I still have a hard time figuring out what would not
be solved simply by removing the div--, which seems to match what your
commit log says.

I'm probably not doing a good job explaining the change. Let me try it
with a few examples. Consider the following three approaches:

A (original, unpatched version):
========================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
        if (div > 0)
               div--;
} else {
...
}

B (original version, but with div-- removed):
=================================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
   ...
} else {
...
}

C (version after this patch):
=====================
div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
if (div <= SUN4I_CLK_CTL_CDR2_MASK) {
   ...
} else {
   ...
}

And now the following values for mclk, tfr->speed and the resulting
div and SPI_CLK

tfr->speed_hz = 50,000
mclk = 214,285
A: div = 1, SPI_CLK = 53,571(!)
B: div = 2, SPI_CLK = 35,714
C: div = 2, SPI_CLK = 35,714

tfr->speed_hz = 50,000
mclk = 200,000
A: div = 1, SPI_CLK = 50,000
B: div = 2, SPI_CLK = 33,333(!)
C: div = 1, SPI_CLK = 50,000

tfr->speed_hz = 650,000
mclk = 1,6000,000
A: div = 11, SPI_CLK = 666,667(!)
B: div = 12, SPI_CLK = 615,385
C: div = 12, SPI_CLK = 615,385

tfr->speed_hz = 1,000,000
mclk = 2,000,000
A: div = 0, SPI_CLK = 1,000,000
B: div = 1, SPI_CLK = 500,000(!)
C: div = 0, SPI_CLK = 1,000,000

I hope that makes it a little bit easier to understand the change. Of
course, the change only makes sense if you agree that the acutal SPI
transfer speed should never exceed the requested speed. Which I think
is the right approach... but maybe you think otherwise?

Thanks for taking the time to look at this so carefully!

Marcus

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help