Re: [PATCH 0/2] PCI: Universal error recoverability of devices
From: Bjorn Helgaas <helgaas@kernel.org>
Date: 2025-11-14 23:45:50
Also in:
linux-pci
On Sun, Oct 12, 2025 at 03:25:00PM +0200, Lukas Wunner wrote:
When PCI devices are reset -- either to recover from an error or after a D3hot/D3cold transition -- their Config Space needs to be restored. D3hot/D3cold transitions happen under the control of the kernel, hence it is able to save Config Space before and restore it afterwards. However errors may occur unexpectedly and it may then be impossible to save Config Space because the device may be inaccessible (e.g. DPC) or Config Space may be corrupted. So it must be saved ahead of time. This isn't done consistently because the PCI core doesn't take care of it and only a subset of drivers do. The situation is aggravated by the behavior of pci_restore_state(), which only allows restoring Config Space once and invalidates the saved copy afterwards. Solve all these problems by saving an initial copy of Config Space on device addition which drivers may update if they change registers. Modify pci_restore_state() to allow using the saved copy indefinitely and drop all the workarounds for its previous behavior that have accumulated in the tree. Lukas Wunner (2): PCI: Ensure error recoverability at all times treewide: Drop pci_save_state() after pci_restore_state() drivers/crypto/intel/qat/qat_common/adf_aer.c | 2 -- drivers/dma/ioat/init.c | 1 - drivers/net/ethernet/broadcom/bnx2.c | 2 -- drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 - drivers/net/ethernet/broadcom/tg3.c | 1 - drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c | 1 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 -- drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c | 1 - drivers/net/ethernet/intel/e1000e/netdev.c | 1 - drivers/net/ethernet/intel/fm10k/fm10k_pci.c | 6 ------ drivers/net/ethernet/intel/i40e/i40e_main.c | 1 - drivers/net/ethernet/intel/ice/ice_main.c | 2 -- drivers/net/ethernet/intel/igb/igb_main.c | 2 -- drivers/net/ethernet/intel/igc/igc_main.c | 2 -- drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 1 - drivers/net/ethernet/mellanox/mlx4/main.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/main.c | 1 - drivers/net/ethernet/meta/fbnic/fbnic_pci.c | 1 - drivers/net/ethernet/microchip/lan743x_main.c | 1 - drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 4 ---- drivers/net/ethernet/neterion/s2io.c | 1 - drivers/pci/bus.c | 7 +++++++ drivers/pci/pci.c | 3 --- drivers/pci/pcie/portdrv.c | 1 - drivers/pci/probe.c | 2 -- drivers/scsi/bfa/bfad.c | 1 - drivers/scsi/csiostor/csio_init.c | 1 - drivers/scsi/ipr.c | 1 - drivers/scsi/lpfc/lpfc_init.c | 6 ------ drivers/scsi/qla2xxx/qla_os.c | 5 ----- drivers/scsi/qla4xxx/ql4_os.c | 5 ----- drivers/tty/serial/8250/8250_pci.c | 1 - drivers/tty/serial/jsm/jsm_driver.c | 1 - 33 files changed, 7 insertions(+), 62 deletions(-)
Applied to pci/err, maybe for v6.19? It touches a lot of drivers, so it'd be nice to have more time in -next, but it is mostly in error recovery paths that aren't going to be exercised much anyway. I'll watch for a minor update of comments and update if I see it. Thanks a lot for your work and description of this. It's a big step in my understanding of PM and error recovery. Which still leaves me mostly ignorant, just slightly less so. Bjorn