Thread (30 messages) 30 messages, 4 authors, 2021-03-18

RE: Errant readings on LM81 with T2080 SoC

From: David Laight <hidden>
Date: 2021-03-15 09:47:05
Also in: linux-hwmon, linux-i2c, lkml

From: Chris Packham
Sent: 14 March 2021 21:26

On 12/03/21 10:25 pm, David Laight wrote:
quoted
From: Linuxppc-dev Guenter Roeck
quoted
Sent: 11 March 2021 21:35

On 3/11/21 1:17 PM, Chris Packham wrote:
quoted
On 11/03/21 9:18 pm, Wolfram Sang wrote:
quoted
quoted
Bummer. What is really weird is that you see clock stretching under
CPU load. Normally clock stretching is triggered by the device, not
by the host.
One example: Some hosts need an interrupt per byte to know if they
should send ACK or NACK. If that interrupt is delayed, they stretch the
clock.
It feels like something like that is happening. Looking at the T2080
Reference manual there is an interesting timing diagram (Figure 14-2 if
someone feels like looking it up). It shows SCL low between the ACK for
the address and the data byte. I think if we're delayed in sending the
next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.
I think that really leaves you only two options that I can see:
Rework the driver to handle critical actions (such as setting TXAK,
and everything else that might result in clock stretching) in the
interrupt handler, or rework the driver to handle everything in
a high priority kernel thread.
I'm not sure a high priority kernel thread will help.
Without CONFIG_PREEMPT (which has its own set of nasties)
a RT process won't be scheduled until the processor it last
ran on does a reschedule.
I don't think a kernel thread will be any different from a
user process running under the RT scheduler.

I'm trying to remember the smbus spec (without remembering the I2C one).
For those following along the spec is available here[0]. I know there's
a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0
vintage.
quoted
While basically a clock+data bit-bang the slave is allowed to drive
the clock low to extend a cycle.
It may be allowed to do this at any point?
 From what I can see it's actually the master extending the clock. Or
more accurately holding it low between the address and data bytes (which
from the T2080 reference manual looks expected). I think this may cause
a strictly compliant SMBUS device to determine that Tlow:mext has been
violated.
Yes, the spec does seem to assume that is a signal is stable
for 20ms something has gone 'horribly wrong'.
I wasn't worries about that, our fpga does the whole transaction
as a single command.
None of our slaves generate interrupts - so it is purely master/slave.

If you run your process under the RT scheduler it is unlikely
that pre-emption will be delayed by long enough to stop the process
running for 10ms.
I've seen >1ms delays (testing RTP audio), but most of the long
loops have a cond_resched() in them.

...
Probably depends on the device implementation. I've got multiple other
I2C/SMBUS devices and the LM81 seems to be the one that objects.
I bet most don't implement any of the timeouts.

I found one interesting pmbus device.
Sometimes it would detect a STOP condition because the data line
went high when it tri-stated its output driver in response to the
rising clock edge!
So it saw the same clock edge twice.
[0] - http://www.smbus.org/specs/smbus20.pdf
[1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf
I should have both those - I've copied them to the directory where
I'd look for them first!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help