Thread (8 messages) 8 messages, 3 authors, 2021-06-25

RE: MCP2518FD Drivers Rarely Working with Custom Kernel 5.10.Y

From: Joshua Quesenberry <hidden>
Date: 2021-06-23 17:34:15

Hey!

I have attached config.txt so you all can see what I'm doing.

I added printing the error number as Marc suggested and the number appears to be -110 every time.

[   25.660006] CAN device driver interface
[   25.668720] spi_master spi0: will run message pump with realtime priority
[   25.676697] mcp251xfd spi0.1 can0: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.
[   25.684900] mcp251xfd spi0.0 can1: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.
[   28.098033] mcp251xfd spi0.1 rename4: renamed from can0
[   28.175644] mcp251xfd spi0.0 can0: renamed from can1
[   28.225891] mcp251xfd spi0.1 can1: renamed from rename4
[  146.964971] mcp251xfd spi0.0: SPI transfer timed out
[  146.965023] spi_master spi0: failed to transfer one message from queue (ret=-110)
[  146.965216] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965247] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965277] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965286] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000).
[  146.965331] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965360] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965389] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
[  146.965397] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000).
[  146.965413] A link change request failed with some changes committed already. Interface can0 may have been left with an inconsistent configuration, please check.

Regarding the discussion about Kconfig flags, I went ahead and rebuilt kernel 5.10.44 using a config that was essentially arch/arm/configs/bcm2711_defconfig with these additions needed to get our I2S working. This should have undone the switch to ONDEMAND governor and enabling 1000 Hz clock.

1030a1031
CONFIG_SND_RPI_I2S_AUDIO_WM8782=m
1040a1042
CONFIG_SND_SOC_WM8782=m
My RPi and HAT have worked very reliably with the older buster image and customized (same tweaks as mentioned in last email) kernel 4.19.73, in that kernel I'm using MCP25XXFD driver from msperl which under 5.10.Y kernel is having issues too. I only upgraded everything on my system at the end of last week, so hardware has been OK very recently.

Keep in mind I'm not seeing a total failure, I do occasionally see everything work correctly and I can run the ip link setup command without issue, it's just not common and it seems fully removing power from the system and reapplying seems to help, but not every time, so maybe it's a coincidence. It could be an issue of subsequent configurations of the controller after the initial setup on power application, but I'd expect it work after every power yank I think.

I wouldn't feel comfortable reverting my /boot/config.txt to a stock one and a default setup of the 40-pin header, at least not with my HAT attached which includes the CAN controllers AND circuitry to supply power to RPi from a 12V rail.

Thanks,

Josh Q

-----Original Message-----
From: Patrick Menschel <redacted> 
Sent: Wednesday, June 23, 2021 1:24 AM
To: Joshua Quesenberry <redacted>; Marc Kleine-Budde <mkl@pengutronix.de>
Cc: kernel@pengutronix.de; linux-can@vger.kernel.org
Subject: Re: MCP2518FD Drivers Rarely Working with Custom Kernel 5.10.Y

Am 23.06.21 um 04:59 schrieb Joshua Quesenberry:
Thank you Marc, I had tried finding a Linux CAN forum, but 
unfortunately searching for "CAN" in Google is about the most 
unhelpful search term one could use... so thanks for replying and 
getting me to a more appropriate audience.

Reverting my system back to where CAN was working will probably be 
challenging. Our main goal was to get Boot from USB on the RPi 
enabled, but this unfortunately meant upgrading every piece of 
software and firmware available... previously we were still on Buster, 
but the OS snapshot was from Spring 2020 (if not Fall/Winter 2019), if 
not earlier, the firmware was much older, and the kernel was 4.19.73, 
wherein the MCP251XFD driver didn't exist yet. So getting back there 
will mean throwing a saved SD Card image on from Spring 2020 and then 
trying to figure out how to force downgrade the firmware. A colleague 
started this upgrade process for another project and was seeing these 
same results on two separate RPi, he did the OS and firmware upgrades, 
but I did the building of the 5.10.17 kernel. So including those two 
RPi and mine, that's three total systems with mostly non-working CAN 
where it had been working fine, my system has slightly newer RPi 
firmware now and the 5.10.44 kernel, the hope was maybe I'd pick up a 
patch somewhere, but no such luck. If you still think it would be 
beneficial to go through the effort of downgrading everything to 
verify the hardware I can do that, but just want to make sure before I 
start that since it'll take a while.

I updated spi.c to include printing the error number as you requested 
and that's all baking now. When I get into work in the morning (US
EST) I'll get the changes deployed and try it out. Since this issue is 
a very high failure rate, getting a log shouldn't be an issue.

Some background on the custom kernel... when I switched to the 5.10.Y 
branch, I used arch/arm/configs/bcm2711_defconfig as my base config 
and then switched on preempt, switched to 1000Hz kernel timer, 
switched the default governor from powersave to ondemand, switched on 
debug flag (CONFIG_DEBUG_USER=y), enabled a few different CAN drivers 
we may encounter, and enabled some stuff for the WM8782 I2S chip. I 
probably should have recreated my config after 5.10.44, but I hadn't 
considered till this writing, looking at this diff there a few bits 
that are new I probably could benefit from including, but I don't see 
anything that I'd be concerned about.

`diff bcm2711_defconfig hel_bcm2711_lowlatency_defconfig`
15d14
< CONFIG_ATA=m
43d41
< CONFIG_BH1750=m
53c51
< CONFIG_BLK_DEV_NVME=y
---
quoted
CONFIG_BLK_DEV_NVME=m
120c118
< CONFIG_CAN_J1939=m
---
quoted
CONFIG_CAN_KVASER_USB=m
123a122,123
quoted
CONFIG_CAN_MCP25XXFD=m
CONFIG_CAN_PEAK_USB=m
127d126
< CONFIG_CCS811=m
155c154
< CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE=y
---
quoted
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
158,159c157
< CONFIG_CPU_FREQ_GOV_ONDEMAND=y
< CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
---
quoted
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
184a183
quoted
CONFIG_DEBUG_USER=y
209d207
< CONFIG_DRM_PANEL_JDI_LT070ME05000=m
319a318
quoted
CONFIG_GENERIC_PHY=y
325d323
< CONFIG_GPIO_PCA953X_IRQ=y
395a394
quoted
CONFIG_HZ_1000=y
561d559
< CONFIG_IR_TOY=m
826d823
< CONFIG_NF_LOG_ARP=m
828d824
< CONFIG_NF_LOG_NETDEV=m
950c946
< CONFIG_PREEMPT_VOLUNTARY=y
---
quoted
CONFIG_PREEMPT=y
957d952
< CONFIG_QCA7000_UART=m
994d988
< CONFIG_RPI_POE_POWER=m
1040a1035
quoted
# CONFIG_RTC_HCTOSYS is not set
1044,1045d1038
< CONFIG_SATA_AHCI=m
< CONFIG_SATA_MV=m
1054d1046
< CONFIG_SENSIRION_SGP30=m
1134a1127
quoted
CONFIG_SND_RPI_I2S_AUDIO_WM8782=m
1149a1143
quoted
CONFIG_SND_SOC_WM8782=m
The /boot/config.txt I included in the forum posts mentioned is 
tweaking the 40-pin header quite a bit from the default setup, we're 
using many of the pins for our HAT and planned for possibly adding 
more in the future.
Hi,

it would help to find a reference to that config.txt .

Regarding the changed Kconfig flags, I would suspect everything that owns a =y to be the culprit, especially everything that has connections to a clock.
Ever since the first rpi3, clocks are unreliable in general due to the frequency governor. The rpi guys did there best to get rid of most of the initial problems but the root cause remains.

The interesting question is, does a stock raspbian buster work with your hardware and that config.txt?

I'm running a stock raspbian buster on a rpi3b+ with seeed can fd hat v2
24/7 for a couple of month now and did not expierence any problems.

Regards,
Patrick

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help