Thread (22 messages) 22 messages, 4 authors, 2015-10-21

Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion

From: Gavin Shan <hidden>
Date: 2015-10-12 23:26:12

On Tue, Oct 13, 2015 at 09:55:53AM +1100, Daniel Axtens wrote:
quoted
Currently, we rely on the existence of struct pci_driver::err_handler
to judge if the corresponding PCI device should be unplugged during
EEH recovery (partially hotplug case). However, it's not elaborate.
some device drivers are implementing part of the EEH error handlers
to collect diag-data. That means the driver still expects a hotplug
to recover from the EEH error.
quoted
This makes the hotplug criterion more relaxed: if the device driver
doesn't provide all necessary EEH error handlers, it will experience
hotplug during EEH recovery.
Interesting.

My understanding of Documentation/PCI/pci-error-recovery.txt is that a
driver should be able to just supply an error_detected() callback. If
the driver just wants to collect diag-data and wants to be hotplugged,
it should return PCI_ERS_RESULT_NONE.

What drivers did you have in mind?
Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
private GPU drivers. I tried to find the source code from upstream kernel,
but failed.

Taking an example, one PE has two different devices A and B. A's driver
privides error_detected()/slot_reset()/resume() and it's returning NEED_RESET.
B's driver just provides error_detected() that returns NONE as you said.
EEH core receives NEED_RESET and B won't be having hotplug during recovery.
The error won't be recovered on B.

Thanks,
Gavin
quoted
Signed-off-by: Gavin Shan <redacted>
---
 arch/powerpc/kernel/eeh_driver.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3a626ed..32178a4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler &&
+		    driver->err_handler->error_detected &&
+		    driver->err_handler->slot_reset &&
+		    driver->err_handler->resume)
 			return NULL;
 	}
 
-- 
2.1.0

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help