Re: [PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV
From: Cédric Le Goater <clg@kaod.org>
Date: 2020-10-08 04:38:46
Also in:
linux-pci
On 10/8/20 4:23 AM, Oliver O'Halloran wrote:
On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater [off-list ref] wrote:quoted
To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to clear all interrupt mappings when the bus is removed. This routine frees an array allocated in pcibios_scan_phb(). This broke PHB hotplug on PowerNV because, when a PHB is removed and re-scanned through sysfs, the PCI layer un-assigns and re-assigns resources to the PHB but does not destroy and recreate the PCI controller structure. Since pcibios_remove_bus() does not clear the 'irq_map' array pointer, a second removal of the PHB will try to free the array a second time and corrupt memory."PHB hotplug" and "hot-plugging devices under a PHB" are different things. What you're saying here doesn't make a whole lot of sense to me unless you're conflating the two. The distinction is important since on pseries we can use DLPAR to add and remove actual PHBs (i.e. the pci_controller) at runtime, but there's no corresponding mechanism on PowerNV.
And it's even different on QEMU.
quoted
Free the 'irq_map' array in pcibios_free_controller() to fix corruption and clear interrupt mapping after it has been disposed. This to avoid filling up the array with successive remove/rescan of a bus.Even with this patch I think we're still broken. With this patch applied the init path is something like: per-phb init: allocate phb->irq_map array per-bus init: nothing per-device init: pcibios_bus_add_device() pci_read_irq_line() pci_irq_map_register(pci_dev, virq) *record the device's interrupt in phb->irq_map* And the teardown path: per-device teardown: nothing per-bus teardown: pcibios_remove_bus() pci_irq_map_dispose() *walk phb->irq_map and dispose of each mapped interrupt* per-phb teardown: free(phb->irq_map) There's a lot of asymmetry here, which is a problem in itself, but the real issue is that when removing *any* pci_bus under a PHB we dispose of the LSI\ for *every* device under that PHB. Not good. Ideally we should be fixing this by having the per-device teardown handle disposing the mapping. Unfortunately, there's no pcibios hook that's called when removing a pci_dev. However, we can register a bus notifier which will be called when the pci_dev is removed from its bus and we already do that for the per-device EEH teardown and to handle IOMMU TCE invalidation when the device is removed.
I lack the knowledge here and I think some else should take over, as I am not doing a good job. Michael, can you drop the initial patch again :/ It is better not to merge anything. Thanks, C.