Thread (34 messages) 34 messages, 4 authors, 2020-09-07

Re: fsl_espi errors on v5.7.15

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: 2020-08-25 07:22:58
Also in: linux-spi, lkml

On 25.08.2020 05:54, Chris Packham wrote:
On 25/08/20 10:04 am, Chris Packham wrote:
quoted
On 20/08/20 9:08 am, Chris Packham wrote:
quoted
On 19/08/20 6:15 pm, Heiner Kallweit wrote:
quoted
On 19.08.2020 00:44, Chris Packham wrote:
quoted
Hi Again,

On 17/08/20 9:09 am, Chris Packham wrote:
quoted
On 14/08/20 6:19 pm, Heiner Kallweit wrote:
quoted
On 14.08.2020 04:48, Chris Packham wrote:
quoted
Hi,

I'm seeing a problem with accessing spi-nor after upgrading a T2081
based system to linux v5.7.15

For this board u-boot and the u-boot environment live on spi-nor.

When I use fw_setenv from userspace I get the following kernel logs

# fw_setenv foo=1
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
empty!
fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
empty!
fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
empty!
fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
...
This error reporting doesn't exist yet in 4.4. So you may have an 
issue
under 4.4 too, it's just not reported.
Did you verify that under 4.4 fw_setenv actually has an effect?
Just double checked and yes under 4.4 the setting does get saved.
quoted
quoted
If I run fw_printenv (before getting it into a bad state) it is 
able to
display the content of the boards u-boot environment.
This might indicate an issue with spi being locked. I've seen 
related
questions, just use the search engine of your choice and check for
fw_setenv and locked.
I'm running a version of fw_setenv which includes
https://gitlab.denx.de/u-boot/u-boot/-/commit/db820159 so it 
shouldn't
be locking things unnecessarily.
quoted
quoted
If been unsuccessful in producing a setup for bisecting the 
issue. I do
know the issue doesn't occur on the old 4.4.x based kernel but 
that's
probably not much help.

Any pointers on what the issue (and/or solution) might be.
I finally managed to get our board running with a vanilla kernel. With
corenet64_smp_defconfig I occasionally see

    fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!

other than the message things seem to be working.

With a custom defconfig I see

    fsl_espi ffe110000.spi: Transfer done but SPIE_DON isn't set!
    fsl_espi ffe110000.spi: Transfer done but rx/tx fifo's aren't 
empty!
    fsl_espi ffe110000.spi: SPIE_RXCNT = 1, SPIE_TXCNT = 32
    ...

and access to the spi-nor does not work until the board is reset.

I'll try and pick apart the differences between the two defconfigs.
I now think my earlier testing is invalid. I have seen the problem 
with either defconfig if I try hard enough. I had convinced myself 
that the problem was CONFIG_PREEMPT but that was before I found 
boot-to-boot differences with the same kernel.

It's possible that I'm chasing multiple issues with the same symptom.

The error I'm most concerned with is in the sequence
1. boot with old image
2. write environment
3. boot with new image
4. write environment
5. write fails and environment is corrupted

After I recover the system things sometimes seem fine. Until I repeat 
the sequence above.
quoted
Also relevant may be:
- Which dts are you using?
Custom but based heavily on the t2080rdb.
quoted
- What's the spi-nor type, and at which frequency are you operating it?
The board has several alternate parts for the spi-nor so the dts just 
specifies compatible = "jedec,spi-nor" the actual chip detected on 
the board I have is "n25q032a (4096 Kbytes)". The dts sets 
spi-max-frequency = <10000000> I haven't measured the actual 
frequency on the bus.
quoted
- Does the issue still happen if you lower the frequency?
I did play around with the frequency initially but I should probably 
give that another go now that I have a better reproduction method.
Playing around with the frequency didn't help.

One thing that I've found is that the problem appears to be that I end 
up with extra bytes in the RX FIFO. If I add code to drain the RX FIFO 
then the system is able to keep accessing the spi-nor (albeit with 
some noisy logs).
I've been staring at spi-fsl-espi.c for while now and I think I've 
identified a couple of deficiencies that may or may not be related to my 
issue.

First I think the 'Transfer done but SPIE_DON isn't set' message can be 
generated spuriously. In fsl_espi_irq() we read the ESPI_SPIE register. 
We also write back to it to clear the current events. We re-read it in 
fsl_espi_cpu_irq() and complain when SPIE_DON is not set. But we can 
naturally end up in that situation if we're doing a large read. Consider 
the messages for reading a block of data from a spi-nor chip

  tx = READ_OP + ADDR
  rx = data

We setup the transfer and pump out the tx_buf. The first interrupt goes 
off and ESPI_SPIE has SPIM_DON and SPIM_RXT set. We empty the rx fifo, 
clear ESPI_SPIE and wait for the next interrupt. The next interrupt 
fires and this time we have ESPI_SPIE with just SPIM_RXT set. This 
continues until we've received all the data and we finish with ESPI_SPIE 
having only SPIM_RXT set. When we re-read it we complain that SPIE_DON 
isn't set.

The other deficiency is that we only get an interrupt when the amount of 
data in the rx fifo is above FSL_ESPI_RXTHR. If there are fewer than 
FSL_ESPI_RXTHR left to be received we will never pull them out of the fifo.
SPIM_DON will trigger an interrupt once the last characters have been
transferred, and read the remaining characters from the FIFO.
I think the reason I'm seeing some variability is because of how fast 
(or slow) the interrupts get processed and how fast the spi-nor chip can 
fill the CPUs rx fifo.
To rule out timing issues at high bus frequencies I initially asked
for re-testing at lower frequencies. If you e.g. limit the bus to 1 MHz
or even less, then timing shouldn't be an issue.

Last relevant functional changes have been done almost 4 years ago.
And yours is the first such report I see. So question is what could be so
special with your setup that it seems you're the only one being affected.
The scenarios you describe are standard, therefore much more people
should be affected in case of a driver bug.

You said that kernel config impacts how frequently the issue happens.
Therefore question is what's the diff in kernel config, and how could
the differences be related to SPI.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help