Re: [PATCH 0/5] s390/pci: automatic error recovery
From: Niklas Schnelle <schnelle@linux.ibm.com>
Date: 2021-09-07 12:21:52
Also in:
linuxppc-dev, lkml
On Tue, 2021-09-07 at 10:45 +0200, Niklas Schnelle wrote:
On Tue, 2021-09-07 at 12:04 +1000, Oliver O'Halloran wrote:quoted
On Mon, Sep 6, 2021 at 7:49 PM Niklas Schnelle [off-list ref] wrote:quoted
Patch 3 I already sent separately resulting in the discussion below but without a final conclusion. https://lore.kernel.org/lkml/20210720150145.640727-1-schnelle@linux.ibm.com/ (local) I believe even though there were some doubts about the use of pci_dev_is_added() by arch code the existing uses as well as the use in the final patch of this series warrant this export.The use of pci_dev_is_added() in arch/powerpc was because in the past pci_bus_add_device() could be called before pci_device_add(). That was fixed a while ago so It should be safe to remove those calls now.Hmm, ok that confirms Bjorns suspicion and explains how it came to be. I can certainly sent a patch for that. This would then leave only the existing use in s390 which I added because of a dead lock prevention and explained here: https://lore.kernel.org/lkml/87d15d5eead35c9eaa667958d057cf4a81a8bf13.camel@linux.ibm.com/ (local) Plus the need to use it in the recovery code of this series. I think in the EEH code the need for a similar check is alleviated by the checks in the beginning of arch/powerpc/kernel/eeh_driver.c:eeh_handle_normal_event() especially eeh_slot_presence_check() which checks presence via the hotplug slot. I guess we could use our own state tracking in a similar way but felt like pci_dev_is_added() is the more logical choice.
Looking into this again, I think we actually can't easily track this state ourselves outside struct pci_dev. The reason for this is that when e.g. arch/s390/pci/pci_sysfs.c:recover_store() removes the struct pci_dev and scans it again the new struct pci_dev re-uses the same struct zpci_dev because from a platform point of view the PCI device was never removed but only disabled and re-enabled. Thus we can only distinguish the stale struct pci_dev by looking at things stored in struct pci_dev itself. That said, I think for the recovery case we might be able to drop the pci_dev_is_added() and rely on pdev->driver != NULL which we check anyway and that should catch any PCI device that was already removed.