Re: [PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()
From: Marta Rybczynska <hidden>
Date: 2019-08-22 17:07:35
Also in:
linux-pci
----- On 16 Aug, 2019, at 18:50, Sergey Miroshnichenko s.miroshnichenko@yadro.com wrote:
quoted hunk ↗ jump to hunk
This is a yet another approach to fix an old [1-2] concurrency issue, when: - two or more devices are being hot-added into a bridge which was initially empty; - a bridge with two or more devices is being hot-added; - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges. The problem is that a bridge is reported as enabled before the MEM/IO bits are actually written to the PCI_COMMAND register, so another driver thread starts memory requests through the not-yet-enabled bridge: CPU0 CPU1 pci_enable_device_mem() pci_enable_device_mem() pci_enable_bridge() pci_enable_bridge() pci_is_enabled() return false; atomic_inc_return(enable_cnt) Start actual enabling the bridge ... pci_is_enabled() ... return true; ... Start memory requests <-- FAIL ... Set the PCI_COMMAND_MEMORY bit <-- Must wait for this Protect the pci_enable/disable_device() and pci_enable_bridge(), which is similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race while enabling upstream bridges"), but adding a per-device mutexes and preventing the dev->enable_cnt from from incrementing early. CC: Srinath Mannam <redacted> CC: Marta Rybczynska <redacted> Signed-off-by: Sergey Miroshnichenko <redacted> [1] https://lore.kernel.org/linux-pci/1501858648-22228-1-git-send-email-srinath.mannam@broadcom.com/T/#u (local) [RFC PATCH v3] pci: Concurrency issue during pci enable bridge [2] https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.JavaMail.zimbra@kalray.eu/T/#u (local) [RFC PATCH] nvme: avoid race-conditions when enabling devices --- drivers/pci/pci.c | 26 ++++++++++++++++++++++---- drivers/pci/probe.c | 1 + include/linux/pci.h | 1 + 3 files changed, 24 insertions(+), 4 deletions(-)diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 1b27b5af3d55..e7f8c354e644 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c@@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)struct pci_dev *bridge; int retval; + mutex_lock(&dev->enable_mutex); + bridge = pci_upstream_bridge(dev); if (bridge) pci_enable_bridge(bridge);@@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)if (pci_is_enabled(dev)) { if (!dev->is_busmaster) pci_set_master(dev); + mutex_unlock(&dev->enable_mutex); return; }
This code is used by numerous drivers and when we've seen that issue I was wondering if there are some use-cases when this (or pci_disable_device) is called with interrupts disabled. It seems that it shouldn't be, but a BUG_ON or error when someone calls it this way would be helpful when debugging. Marta