PCI: Work around PCIe link training failures
From: Matthew W Carlis <hidden>
Date: 2024-08-15 19:41:06
Also in:
linux-pci, linux-rdma, lkml, netdev
Sorry for the delay in my responses here I had some things get in my way. On Fri, 9 Aug 2024 09:13:52 Oliver O'Halloran [off-list ref] wrote:
Ok? If we have to check for DPC being enabled in addition to checking the surprise bit in the slot capabilities then that's fine, we can do that. The question to be answered here is: how should this feature work on ports where it's normal for a device to be removed without any notice?
I'm not sure if its the correct thing to check however. I assumed that ports using the pciehp driver would usually consider it "normal" for a device to be removed actually, but maybe I have the idea of hp reversed. On Fri, 9 Aug 2024 14:34:04 Maciej W. Rozycki [off-list ref] wrote:
Well, in principle in a setup with reliable links the LBMS bit may never be set, e.g. this system of mine has been in 24/7 operation since the last reboot 410 days ago and for the devices that support Link Active reporting it shows: ... so out of 11 devices 6 have the LBMS bit clear. But then 5 have it set, perhaps worryingly, so of course you're right, that it will get set in the field, though it's not enough by itself for your problem to trigger.
The way I look at it is that its essentially a probability distribution with time, but I try to avoid learning too much about the physical layer because I would find myself debugging more hardware issues lol. I also don't think LBMS/LABS being set by itself is very interesting without knowing the rate at which it is being set. FWIW I have seen some devices in the past going into recovery state many times a second & still never downtrain, but at the same time they were setting the LBMS/LABS bits which maybe not quite spec compliant. I would like to help test these changes, but I would like to avoid having to test each mentioned change individually. Does anyone have any preferences in how I batch the patches for testing? Would it be ok if I just pulled them all together on one go? - Matt