RE: Errant readings on LM81 with T2080 SoC
From: David Laight <hidden>
Date: 2021-03-15 09:47:05
Also in:
linux-hwmon, linux-i2c, lkml
From: Chris Packham
Sent: 14 March 2021 21:26 On 12/03/21 10:25 pm, David Laight wrote:quoted
From: Linuxppc-dev Guenter Roeckquoted
Sent: 11 March 2021 21:35 On 3/11/21 1:17 PM, Chris Packham wrote:quoted
On 11/03/21 9:18 pm, Wolfram Sang wrote:quoted
quoted
Bummer. What is really weird is that you see clock stretching under CPU load. Normally clock stretching is triggered by the device, not by the host.One example: Some hosts need an interrupt per byte to know if they should send ACK or NACK. If that interrupt is delayed, they stretch the clock.It feels like something like that is happening. Looking at the T2080 Reference manual there is an interesting timing diagram (Figure 14-2 if someone feels like looking it up). It shows SCL low between the ACK for the address and the data byte. I think if we're delayed in sending the next byte we could violate Ttimeout or Tlow:mext from the SMBUS spec.I think that really leaves you only two options that I can see: Rework the driver to handle critical actions (such as setting TXAK, and everything else that might result in clock stretching) in the interrupt handler, or rework the driver to handle everything in a high priority kernel thread.I'm not sure a high priority kernel thread will help. Without CONFIG_PREEMPT (which has its own set of nasties) a RT process won't be scheduled until the processor it last ran on does a reschedule. I don't think a kernel thread will be any different from a user process running under the RT scheduler. I'm trying to remember the smbus spec (without remembering the I2C one).
For those following along the spec is available here[0]. I know there's a 3.0 version[1] as well but the devices I'm dealing with are from a 2.0 vintage.quoted
While basically a clock+data bit-bang the slave is allowed to drive the clock low to extend a cycle. It may be allowed to do this at any point?From what I can see it's actually the master extending the clock. Or more accurately holding it low between the address and data bytes (which from the T2080 reference manual looks expected). I think this may cause a strictly compliant SMBUS device to determine that Tlow:mext has been violated.
Yes, the spec does seem to assume that is a signal is stable for 20ms something has gone 'horribly wrong'. I wasn't worries about that, our fpga does the whole transaction as a single command. None of our slaves generate interrupts - so it is purely master/slave. If you run your process under the RT scheduler it is unlikely that pre-emption will be delayed by long enough to stop the process running for 10ms. I've seen >1ms delays (testing RTP audio), but most of the long loops have a cond_resched() in them. ...
Probably depends on the device implementation. I've got multiple other I2C/SMBUS devices and the LM81 seems to be the one that objects.
I bet most don't implement any of the timeouts. I found one interesting pmbus device. Sometimes it would detect a STOP condition because the data line went high when it tri-stated its output driver in response to the rising clock edge! So it saw the same clock edge twice.
[0] - http://www.smbus.org/specs/smbus20.pdf [1] - https://pmbus.org/Assets/PDFS/Public/SMBus_3_0_20141220.pdf
I should have both those - I've copied them to the directory where I'd look for them first! David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)