Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion
From: Gavin Shan <hidden>
Date: 2015-10-12 23:26:12
On Tue, Oct 13, 2015 at 09:55:53AM +1100, Daniel Axtens wrote:
quoted
Currently, we rely on the existence of struct pci_driver::err_handler to judge if the corresponding PCI device should be unplugged during EEH recovery (partially hotplug case). However, it's not elaborate. some device drivers are implementing part of the EEH error handlers to collect diag-data. That means the driver still expects a hotplug to recover from the EEH error.quoted
This makes the hotplug criterion more relaxed: if the device driver doesn't provide all necessary EEH error handlers, it will experience hotplug during EEH recovery.Interesting. My understanding of Documentation/PCI/pci-error-recovery.txt is that a driver should be able to just supply an error_detected() callback. If the driver just wants to collect diag-data and wants to be hotplugged, it should return PCI_ERS_RESULT_NONE. What drivers did you have in mind?
Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida private GPU drivers. I tried to find the source code from upstream kernel, but failed. Taking an example, one PE has two different devices A and B. A's driver privides error_detected()/slot_reset()/resume() and it's returning NEED_RESET. B's driver just provides error_detected() that returns NONE as you said. EEH core receives NEED_RESET and B won't be having hotplug during recovery. The error won't be recovered on B. Thanks, Gavin
quoted
Signed-off-by: Gavin Shan <redacted> --- arch/powerpc/kernel/eeh_driver.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c index 3a626ed..32178a4 100644 --- a/arch/powerpc/kernel/eeh_driver.c +++ b/arch/powerpc/kernel/eeh_driver.c@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata) driver = eeh_pcid_get(dev); if (driver) { eeh_pcid_put(dev); - if (driver->err_handler) + if (driver->err_handler && + driver->err_handler->error_detected && + driver->err_handler->slot_reset && + driver->err_handler->resume) return NULL; }-- 2.1.0 _______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev