Thread (30 messages) 30 messages, 4 authors, 2021-03-18

Re: Errant readings on LM81 with T2080 SoC

From: Chris Packham <Chris.Packham@alliedtelesis.co.nz>
Date: 2021-03-08 02:27:56
Also in: linux-hwmon, linux-i2c, lkml

On 8/03/21 1:31 pm, Guenter Roeck wrote:
On 3/7/21 2:52 PM, Chris Packham wrote:
quoted
Hi,

I've got a system using a PowerPC T2080 SoC and among other things has
an LM81 hwmon chip.

Under a high CPU load we see errant readings from the LM81 as well as
actual failures. It's the errant readings that cause the most concern
since we can easily ignore the read errors in our monitoring application
(although it would be better if they weren't there at all).

I'm able to reproduce this with a test application[0] that artificially
creates a high CPU load then by repeatedly checking for the all-1s
values from the LM81 datasheet[1](page 17). The all-1s readings stick
out as they are obviously higher than the voltage rails that are
connected and disagree with measurements taken with a multimeter.

Here's the output from my device

[root@linuxbox ~]# cpuload 90&
[root@linuxbox ~]# (while true; do cat /sys/class/hwmon/hwmon0/in*_input
| grep '3320\|4383\|6641\|15930\|3586'; sleep 1; done)&
3586
3586
cat: read error: No such device or address
cat: read error: No such device or address
3320
3320
3586
3586
6641
6641
4383
4383

Fundamentally I think this is a problem with the fact that the LM81 is
an SMBus device but the T2080 (and other Freescale SoCs) uses i2c and we
emulate SMBus. I suspect the errant readings are when we don't get round
to completing the read within the timeout specified by the SMBus
specification. Depending on when that happens we either fail the
transfer or interpret the result as all-1s.
That is quite unlikely. Many sensor chips are SMBus chips connected to
i2c busses. It is much more likely that there is a bug in the T2080 i2c driver,
that the chip doesn't like the bulk read command issued through regmap, that
the chip has problems with the i2c bus speed, or that the i2c bus is noisy.
Perhaps something gets upset when interrupt processing is delayed 
because of CPU load. I don't see the problem when there isn't a CPU load 
so I think that eliminates board issues.
In this context, the "No such device or address" responses are very suspicious.
Those are reported by the i2c driver, not by the hwmon driver, and suggest
that the chip did not respond to a read request. Maybe it helps to enable
debugging to the i2c driver to see if it reports anything useful. Even
better might be to connect an i2c bus analyzer to the i2c bus and check
what is going on.
That's from -ENXIO which is used in only one place in i2c-mpc.c. I'll 
enable some debug and see what we get.
Guenter
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help