Thread (44 messages) 44 messages, 5 authors, 2025-06-10

Re: PCI: Work around PCIe link training failures

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2024-08-08 23:14:04
Also in: linux-pci, linux-rdma, linuxppc-dev, lkml

On Thu, Aug 8, 2024 at 12:08 PM Matthew W Carlis [off-list ref] wrote:
On Wed, 7 Aug 2024 22:29:35 +1000 Oliver O'Halloran Wrote
quoted
My read was that Matt is essentially doing a surprise hot-unplug by
removing power to the card without notifying the OS. I thought the
LBMS bit wouldn't be set in that case since the link goes down rather
than changes speed, but the spec is a little vague and that appears to
be happening in Matt's testing. It might be worth disabling the
workaround if the port has the surprise hotplug capability bit set.
Most of the systems I have are using downstream port containment which does
not recommend setting the Hot-Plug Surprise in Slot Capabilities & therefore
we do not. The first time we noticed an issue with this patch was in test
automation which was power cycling the endpoints & injecting uncorrectable
errors to ensure our hosts are robust in the face of PCIe chaos & that they
will recover. Later we started to see other teams on other products
encountering the same bug in simpler cases where humans turn on and off
EP power for development purposes.
Ok? If we have to check for DPC being enabled in addition to checking
the surprise bit in the slot capabilities then that's fine, we can do
that. The question to be answered here is: how should this feature
work on ports where it's normal for a device to be removed without any
notice?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help