Thread (17 messages) 17 messages, 5 authors, 2025-11-24

Re: [PATCH 0/2] PCI: Universal error recoverability of devices

From: Bjorn Helgaas <helgaas@kernel.org>
Date: 2025-11-14 23:45:50
Also in: linux-pci

On Sun, Oct 12, 2025 at 03:25:00PM +0200, Lukas Wunner wrote:
When PCI devices are reset -- either to recover from an error or
after a D3hot/D3cold transition -- their Config Space needs to be
restored.

D3hot/D3cold transitions happen under the control of the kernel,
hence it is able to save Config Space before and restore it afterwards.

However errors may occur unexpectedly and it may then be impossible
to save Config Space because the device may be inaccessible (e.g. DPC)
or Config Space may be corrupted.  So it must be saved ahead of time.

This isn't done consistently because the PCI core doesn't take care
of it and only a subset of drivers do.  The situation is aggravated
by the behavior of pci_restore_state(), which only allows restoring
Config Space once and invalidates the saved copy afterwards.

Solve all these problems by saving an initial copy of Config Space
on device addition which drivers may update if they change registers.
Modify pci_restore_state() to allow using the saved copy indefinitely
and drop all the workarounds for its previous behavior that have
accumulated in the tree.

Lukas Wunner (2):
  PCI: Ensure error recoverability at all times
  treewide: Drop pci_save_state() after pci_restore_state()

 drivers/crypto/intel/qat/qat_common/adf_aer.c    | 2 --
 drivers/dma/ioat/init.c                          | 1 -
 drivers/net/ethernet/broadcom/bnx2.c             | 2 --
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
 drivers/net/ethernet/broadcom/tg3.c              | 1 -
 drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c  | 1 -
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  | 2 --
 drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c | 1 -
 drivers/net/ethernet/intel/e1000e/netdev.c       | 1 -
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c     | 6 ------
 drivers/net/ethernet/intel/i40e/i40e_main.c      | 1 -
 drivers/net/ethernet/intel/ice/ice_main.c        | 2 --
 drivers/net/ethernet/intel/igb/igb_main.c        | 2 --
 drivers/net/ethernet/intel/igc/igc_main.c        | 2 --
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 1 -
 drivers/net/ethernet/mellanox/mlx4/main.c        | 1 -
 drivers/net/ethernet/mellanox/mlx5/core/main.c   | 1 -
 drivers/net/ethernet/meta/fbnic/fbnic_pci.c      | 1 -
 drivers/net/ethernet/microchip/lan743x_main.c    | 1 -
 drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 4 ----
 drivers/net/ethernet/neterion/s2io.c             | 1 -
 drivers/pci/bus.c                                | 7 +++++++
 drivers/pci/pci.c                                | 3 ---
 drivers/pci/pcie/portdrv.c                       | 1 -
 drivers/pci/probe.c                              | 2 --
 drivers/scsi/bfa/bfad.c                          | 1 -
 drivers/scsi/csiostor/csio_init.c                | 1 -
 drivers/scsi/ipr.c                               | 1 -
 drivers/scsi/lpfc/lpfc_init.c                    | 6 ------
 drivers/scsi/qla2xxx/qla_os.c                    | 5 -----
 drivers/scsi/qla4xxx/ql4_os.c                    | 5 -----
 drivers/tty/serial/8250/8250_pci.c               | 1 -
 drivers/tty/serial/jsm/jsm_driver.c              | 1 -
 33 files changed, 7 insertions(+), 62 deletions(-)
Applied to pci/err, maybe for v6.19?

It touches a lot of drivers, so it'd be nice to have more time in
-next, but it is mostly in error recovery paths that aren't going to
be exercised much anyway.

I'll watch for a minor update of comments and update if I see it.

Thanks a lot for your work and description of this.  It's a big step
in my understanding of PM and error recovery.  Which still leaves me
mostly ignorant, just slightly less so.

Bjorn
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help