Re: PCI: Work around PCIe link training failures

[PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 01/14] PCI: pciehp: Rely on `link_active_reporting' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 02/14] PCI: Export PCIe link retrain timeout · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 03/14] PCI: Execute `quirk_enable_clear_retrain_link' earlier · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 04/14] PCI: Initialize `link_active_reporting' earlier · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 05/14] powerpc/eeh: Rely on `link_active_reporting' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 06/14] net/mlx5: Rely on `link_active_reporting' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 08/14] PCI: Use distinct local vars in `pcie_retrain_link' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 07/14] PCI: Export `pcie_retrain_link' for use outside ASPM · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 11/14] PCI: Use `pcie_wait_for_link_status' in `pcie_wait_for_link_delay' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 09/14] PCI: Factor our waiting for link training end · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 12/14] PCI: Provide stub failed link recovery for device probing and hot plug · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-07-22
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-07-22
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-07-24
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-07-26
Re: PCI: Work around PCIe link training failures · Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> · 2024-07-29
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-07-29
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-07-29
[PATCH v9 10/14] PCI: Add support for polling DLLLA to `pcie_retrain_link' · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 13/14] PCI: Add failed link recovery for device reset events · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
[PATCH v9 14/14] PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-11
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · Bjorn Helgaas <helgaas@kernel.org> · 2023-06-14
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-15
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · Bjorn Helgaas <helgaas@kernel.org> · 2023-06-15
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-16
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · Bjorn Helgaas <helgaas@kernel.org> · 2023-06-16
Re: [PATCH v9 00/14] pci: Work around ASMedia ASM2824 PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2023-06-20
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-08-06
Re: PCI: Work around PCIe link training failures · Bjorn Helgaas <helgaas@kernel.org> · 2024-08-06
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-08-07
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-08-07
Re: PCI: Work around PCIe link training failures · "Oliver O'Halloran" <oohall@gmail.com> · 2024-08-07
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-08-07
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-08-08
Re: PCI: Work around PCIe link training failures · "Oliver O'Halloran" <oohall@gmail.com> · 2024-08-08
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-08-09
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-08-15
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-08-16
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2024-10-01
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-10-02
Re: PCI: Work around PCIe link training failures · Bjorn Helgaas <helgaas@kernel.org> · 2024-10-02
Re: PCI: Work around PCIe link training failures · "Maciej W. Rozycki" <macro@orcam.me.uk> · 2024-10-03
PCI: Work around PCIe link training failures · Matthew W Carlis <hidden> · 2025-06-10

From: "Maciej W. Rozycki" <macro@orcam.me.uk>
Date: 2024-08-16 13:57:11
Also in: linux-pci, linux-rdma, linuxppc-dev, lkml

On Thu, 15 Aug 2024, Matthew W Carlis wrote:

quoted

Well, in principle in a setup with reliable links the LBMS bit may never 
be set, e.g. this system of mine has been in 24/7 operation since the last 
reboot 410 days ago and for the devices that support Link Active reporting 
it shows:
...
so out of 11 devices 6 have the LBMS bit clear.  But then 5 have it set, 
perhaps worryingly, so of course you're right, that it will get set in the 
field, though it's not enough by itself for your problem to trigger.

The way I look at it is that its essentially a probability distribution with time,
but I try to avoid learning too much about the physical layer because I would find
myself debugging more hardware issues lol. I also don't think LBMS/LABS being set
by itself is very interesting without knowing the rate at which it is being set.

 Agreed.  Ilpo's upcoming bandwidth controller will hopefully give us such 
data.

FWIW I have seen some devices in the past going into recovery state many times a
second & still never downtrain, but at the same time they were setting the
LBMS/LABS bits which maybe not quite spec compliant.

I would like to help test these changes, but I would like to avoid having to test
each mentioned change individually. Does anyone have any preferences in how I batch
the patches for testing? Would it be ok if I just pulled them all together on one go?

 Certainly fine with me, especially as 3/4 and 4/4 aren't really related 
to your failure scenario, and then you need 1/4 and 2/4 both at a time to 
address both aspects of the issue you have reported.

  Maciej

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help