Thread (22 messages) 22 messages, 4 authors, 2015-10-21

Re: [PATCH v2 2/8] powerpc/eeh: More relexed hotplug criterion

From: Daniel Axtens <hidden>
Date: 2015-10-13 02:49:21

Gavin Shan [off-list ref] writes:
Danienl, The issue is tracked by IBM's bugzilla 127612 reported from Nvida
private GPU drivers. I tried to find the source code from upstream kernel,
but failed.
OK. So I've read the internal bug, and I'm going to do my best to summarise
without including confidential info.

 1) A PHB with 2 devices is fenced via error injection.

 2) The error_detected() callback is run on both devices. One returns
    CAN_RECOVER, the other returns NONE.

We then fall through to partial-hotplug handling. (BTW this isn't
documented in Documentation/PCI/pci-error-recovery.txt, so at some point
this should be fixed!)

Partial hotplug is detected by the presence of an err_handler, not by
storing the result of error_detected. Would it be better to store the
result from eeh_report_error in the eeh_dev structure, rather than by
looking at more elements of the err_handler structure?

More generally, drivers using error_detect and then returning NONE as a
way to get data and then not participate in EEH is a hack, and it's not
surprising it's breaking in odd ways, especially with partial hotplug.

Partial hotplug is pretty hacky to begin with, and a driver being able
to opt out of EEH selectively is a useful feature, so we probably want
to redesign the state machine to handle them both better. That would be
a long term project.

Regards,
Daniel
Thanks,
Gavin
quoted
quoted
Signed-off-by: Gavin Shan <redacted>
---
 arch/powerpc/kernel/eeh_driver.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 3a626ed..32178a4 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -416,7 +416,10 @@ static void *eeh_rmv_device(void *data, void *userdata)
 	driver = eeh_pcid_get(dev);
 	if (driver) {
 		eeh_pcid_put(dev);
-		if (driver->err_handler)
+		if (driver->err_handler &&
+		    driver->err_handler->error_detected &&
+		    driver->err_handler->slot_reset &&
+		    driver->err_handler->resume)
 			return NULL;
 	}
 
-- 
2.1.0

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help