Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
From: Marcus Weseloh <hidden>
Date: 2016-01-18 09:41:09
Also in:
linux-arm-kernel, linux-spi, lkml
Hi, 2016-01-17 19:51 GMT+01:00 Maxime Ripard [off-list ref]:
On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:quoted
quoted
quoted
quoted
quoted
- /* Ensure that we have a parent clock fast enough */ + /* + * Ensure that the parent clock is set to twice the max speed + * of the spi device (possibly rounded up by the clk driver) + */ mclk_rate = clk_get_rate(sspi->mclk); - if (mclk_rate < (2 * tfr->speed_hz)) { - clk_set_rate(sspi->mclk, 2 * tfr->speed_hz); + if (spi->max_speed_hz != sspi->cur_max_speed || + mclk_rate != sspi->cur_mclk) {Do you need to cache the values? As far as I'm aware, you end up doing the computation all the time anyway.By caching the values we optimize the case when a single SPI slave device (or even multiple slave devices with the same max_speed) are used multiple times in a row. In that case, we're saving two calls: clk_set_rate and clk_get_rate. I was unsure about how expensive the clk_* calls were, so I thought it would be safer use caching. But maybe it's not worth the extra code?Unless you can point that there's a significant performance difference, I'm not sure it's worth it.I did actually notice a significant transfer latency when a new mod0 clock frequency is set, probably due to the __delay in drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the caching is worth it... at least for the case when there are two slave devices with different transfer speeds accessing the same SPI module.I'm not sure we even need that delay in the first place, at least not for all the clocks using factor. AFAIK, the only case where it's useful is for PLL, and all of them have a bit you can busy-wait on that will let you know when the clock has stabilized.
OK, I didn't know that. Does that mean you would like me to drop the caching from this patch and prepare a patch to remove the __delay from clk-factors?
quoted
quoted
quoted
[...]quoted
quoted
- div = mclk_rate / (2 * tfr->speed_hz); - if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) { - if (div > 0) - div--; - + div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?It is quite often, but not in all cases. The plain division rounds to the nearest integer, so it rounds down sometimes. Consider the following case: we have a slow SPI device with a spi-max-frequency of 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it sets mclk to 214,285Hz. Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div = 1 as the old code subtracts 1 two lines further down The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000) = 3, so div = 2 if we add the -1Except that you have that extra - 1 after your DIV_ROUND_UP calculation in the line you add. so you have to remove 1 from that line above, and then 1 again when we set the register, which ends up being the exact same thing, or am I missing something?The -1 after the DIV_ROUND_UP is already the -1 that is needed to set the register. There's no need for another -1 after that (and there isn't one in the code).I was probably hallucinating :) My bad. That being said, I still have a hard time figuring out what would not be solved simply by removing the div--, which seems to match what your commit log says.
I'm probably not doing a good job explaining the change. Let me try it
with a few examples. Consider the following three approaches:
A (original, unpatched version):
========================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
if (div > 0)
div--;
} else {
...
}
B (original version, but with div-- removed):
=================================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
...
} else {
...
}
C (version after this patch):
=====================
div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
if (div <= SUN4I_CLK_CTL_CDR2_MASK) {
...
} else {
...
}
And now the following values for mclk, tfr->speed and the resulting
div and SPI_CLK
tfr->speed_hz = 50,000
mclk = 214,285
A: div = 1, SPI_CLK = 53,571(!)
B: div = 2, SPI_CLK = 35,714
C: div = 2, SPI_CLK = 35,714
tfr->speed_hz = 50,000
mclk = 200,000
A: div = 1, SPI_CLK = 50,000
B: div = 2, SPI_CLK = 33,333(!)
C: div = 1, SPI_CLK = 50,000
tfr->speed_hz = 650,000
mclk = 1,6000,000
A: div = 11, SPI_CLK = 666,667(!)
B: div = 12, SPI_CLK = 615,385
C: div = 12, SPI_CLK = 615,385
tfr->speed_hz = 1,000,000
mclk = 2,000,000
A: div = 0, SPI_CLK = 1,000,000
B: div = 1, SPI_CLK = 500,000(!)
C: div = 0, SPI_CLK = 1,000,000
I hope that makes it a little bit easier to understand the change. Of
course, the change only makes sense if you agree that the acutal SPI
transfer speed should never exceed the requested speed. Which I think
is the right approach... but maybe you think otherwise?
Thanks for taking the time to look at this so carefully!
Marcus