[PATCH v3 3/8] PCI: brcmstb: Add Broadcom STB PCIe host controller driver
From: helgaas@kernel.org (Bjorn Helgaas)
Date: 2017-12-15 20:14:44
Also in:
linux-devicetree, linux-mips, linux-pci, lkml
On Fri, Dec 15, 2017 at 02:53:57PM -0500, Jim Quinlan wrote:
On Thu, Dec 14, 2017 at 5:51 PM, Bjorn Helgaas [off-list ref] wrote:quoted
On Wed, Dec 13, 2017 at 06:53:53PM -0500, Jim Quinlan wrote:quoted
On Tue, Dec 12, 2017 at 5:16 PM, Bjorn Helgaas [off-list ref] wrote:quoted
On Tue, Nov 14, 2017 at 05:12:07PM -0500, Jim Quinlan wrote:quoted
This commit adds the basic Broadcom STB PCIe controller. Missing is the ability to process MSI and also handle dma-ranges for inbound memory accesses. These two functionalities are added in subsequent commits. The PCIe block contains an MDIO interface. This is a local interface only accessible by the PCIe controller. It cannot be used or shared by any other HW. As such, the small amount of code for this controller is included in this driver as there is little upside to put it elsewhere....quoted
quoted
quoted
+static bool brcm_pcie_valid_device(struct brcm_pcie *pcie, struct pci_bus *bus, + int dev) +{ + if (pci_is_root_bus(bus)) { + if (dev > 0) + return false; + } else { + /* If there is no link, then there is no device */ + if (!brcm_pcie_link_up(pcie)) + return false;This is racy, since the link can go down after you check but before you do the config access. I assume your hardware can deal with a config access that targets a link that is down?Yes, that can happen but there is really nothing that can be done if the link goes down in that vulnerability window. What do you suggest doing?Most hardware drops writes and returns ~0 on reads if the link is down. I assume your hardware does something similar, and that should be enough. You shouldn't have to check whether the link is up.Unfortunately our HW is quite unforgiving and effects a synchronous or asynchronous abort when doing a config access when the link or power has gone down on the EP. I will open a discussion with the PCIe HW folks regarding why our controller does not behave like "most hardware". Thanks, Jim
I mentioned in the other thread [1] that I think the best way to handle this is to figure out how to deal with the abort cleanly. Using a test like *_pcie_link_up() to try to avoid it is a 99% solution that means we'll see rare failures that are very difficult to reproduce and debug.
quoted
The hardware might report errors, e.g., via AER, if the link is down. And we might not not handle those nicely. If we have issues there, we should find out and fix them.
[1] https://lkml.kernel.org/r/20171214225821.GN30595 at bhelgaas-glaptop.roam.corp.google.com