X-Gene: Unhandled fault: synchronous external abort in pci_generic_config_read32
From: Duc Dang <hidden>
Date: 2015-07-28 17:46:00
Also in:
linux-pci, lkml
On Tue, Jul 28, 2015 at 9:43 AM, Bjorn Helgaas [off-list ref] wrote:
On Fri, Jul 24, 2015 at 7:05 PM, Duc Dang [off-list ref] wrote:quoted
Hi Bjorn, On Fri, Jul 24, 2015 at 3:42 PM, Bjorn Helgaas [off-list ref] wrote:quoted
I regularly see faults like this on an APM X-Gene: U-Boot 2013.04-mustang_sw_1.14.14 (Dec 16 2014 - 15:59:33) CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz 32 KB ICACHE, 32 KB DCACHE SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz ... Unhandled fault: synchronous external abort (0x96000010) at 0xffffff8000110034 Internal error: : 96000010 [#1] SMP Modules linked in: CPU: 0 PID: 3723 Comm: ... 4.1.0-smp-DEV #3 Hardware name: APM X-Gene Mustang board (DT) task: ffffffc7dc1a4140 ti: ffffffc7dc118000 task.ti: ffffffc7dc118000 PC is at pci_generic_config_read32+0x4c/0xb8 LR is at pci_generic_config_read32+0x40/0xb8 pc : [<ffffffc00033b90c>] lr : [<ffffffc00033b900>] pstate: 600001c5 ... Call trace: [<ffffffc00033b90c>] pci_generic_config_read32+0x4c/0xb8 [<ffffffc00033bf58>] pci_user_read_config_byte+0x60/0xc4 [<ffffffc0003496a8>] pci_read_config+0x15c/0x238 [<ffffffc0002393b4>] sysfs_kf_bin_read+0x68/0xa0 [<ffffffc00023896c>] kernfs_fop_read+0x9c/0x1ac [<ffffffc0001c361c>] __vfs_read+0x44/0x128 [<ffffffc0001c3e28>] vfs_read+0x84/0x144 [<ffffffc0001c4764>] SyS_read+0x50/0xb0The log shows kernel gets an exception when trying to access Mellanox card configuration space. This is usually due to suboptimal PCIe SerDes parameters are using in your board, which will cause bad link quality. The PCIe SerDes programming is done in U-Boot, so I suggest you do a U-Boot upgrade to our latest X-Gene U-Boot release.I installed U-Boot 1.15.12, which I thought was the latest. I'm still seeing this issue regularly, approx once/hour.
Our latest U-Boot is 1.15.15, but U-Boot 1.15.12 is already a good version to use. Are you running any PCIe traffic test when the error happens? I will try to reproduce the issue with my Mustang board as well. And it will be useful if you can share your "lspci -vvv" output when the board is running, we can check to see if there is any error status reported. -- Regards, Duc Dang.