Thread (17 messages) 17 messages, 5 authors, 2025-11-24

Re: [PATCH 1/2] PCI: Ensure error recoverability at all times

From: Bjorn Helgaas <helgaas@kernel.org>
Date: 2025-11-13 16:16:07
Also in: linux-pci

On Thu, Nov 13, 2025 at 10:38:09AM +0100, Lukas Wunner wrote:
On Wed, Nov 12, 2025 at 04:38:31PM -0600, Bjorn Helgaas wrote:
quoted
On Sun, Oct 12, 2025 at 03:25:01PM +0200, Lukas Wunner wrote:
quoted
Despite these workarounds, recoverability at all times is not guaranteed:
E.g. when a PCIe port goes through a runtime suspend and resume cycle,
the "saved_state" flag is cleared by:

  pci_pm_runtime_resume()
    pci_pm_default_resume_early()
      pci_restore_state()

... and hence on a subsequent AER event, the port's Config Space cannot be
restored.  
I guess this restore would be done by a driver's
pci_error_handlers.slot_reset() or .reset_done() calling
pci_restore_state()?
Yes.  Restoration of config space after an error-recovery-induced reset
is currently always the job of the device driver.

E.g. in the case of portdrv, it happens in pcie_portdrv_slot_reset().

We could revisit this design decision and change the behavior to have
pcie_do_recovery() call pci_restore_state(), thus reducing boilerplate
in the drivers.  But that would be a separate effort, orthogonal to the
present patch.
Agreed.
quoted
quoted
+++ b/drivers/pci/bus.c
@@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev)
 	pci_bridge_d3_update(dev);
 
 	/*
+	 * Save config space for error recoverability.  Clear state_saved
+	 * to detect whether drivers invoked pci_save_state() on suspend.
Can we expand this a little to explain how this is detected and what
drivers *should* be doing?
That is documented in Documentation/power/pci.rst, "3.1.2. suspend()":

   "This callback is expected to quiesce the device and prepare it to be
    put into a low-power state by the PCI subsystem.  It is not required
    (in fact it even is not recommended) that a PCI driver's suspend()
    callback save the standard configuration registers of the device [...]

    However, in some rare case it is convenient to carry out these
    operations in a PCI driver.  Then, pci_save_state() [...] should be
    used to save the device's standard configuration registers [...].
    Moreover, if the driver calls pci_save_state(), the PCI subsystem will
    not execute either pci_prepare_to_sleep(), or pci_set_power_state()
    for its device, so the driver is then responsible for handling the
    device as appropriate."
Yes.  I should have proposed some text for the comment, e.g.,

  Save config space for error recoverability.  Clear state_saved.  If
  driver calls pci_save_state() again, state_saved will be set and
  we'll know that on suspend, the PCI core shouldn't call
  pci_save_state() or change the device power state.
quoted
I think the reason is that the PCI core
can invoke pci_save_state() on suspend if the driver did not.
Right.  By calling pci_save_state(), a driver signals to the PCI core
that it assumes responsibility for putting the device into a low power
state.  If a driver wants to keep a device in D0, it could call
pci_save_state() and thus prevent the PCI core from putting it e.g.
into D3.
It seems like there are two things going on here, and I'm not sure
they're completely compatible:

  1) Driver calls pci_save_state() to take over device power
     management and prevent the PCI core from doing it.

  2) Driver calls pci_save_state() to capture the device state it
     wants to restore when recovering from an error.

Shouldn't a driver be able to do 2) without also getting 1)?
quoted
I assume:

  - PCI core always calls pci_save_state() and clears state_saved when
    device is enumerated (below)

  - When it has configured the device to the state it wants restore,
    the driver may call pci_save_state() again, which will set
    state_saved

  - If driver has not called pci_save_state(), i.e., state_saved is
    still clear, we want the PCI core to call pci_save_state() during
    suspend
Right.
quoted
This sounds sensible to me.  It would be nice if there were a few more
words about pci_save_state() and pci_restore_state() in
Documentation/.

pci_save_state() isn't mentioned at all in Documentation/PCI
Right, it's documented in the Documentation/power directory. :)
Yes, in the pci.rst I mentioned, but it mostly uses the "saves the
device's standard configuration registers" wording.

I'm just wishing for a more concrete mention of "pci_save_state()",
since that's where the critical "state_saved" flag is updated.

And I'm not sure Documentation/ includes anything about the idea of
a driver using pci_save_state() to capture the state it wants to
restore after an error.  I guess that's obvious now that I write it
down, but I'm sure learning a lot from this conversation :)

Bjorn
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help