Re: [PATCH] Hold reference to device_node during EEH event handling
From: Michael Ellerman <hidden>
Date: 2009-07-17 00:36:14
On Thu, 2009-07-16 at 09:33 -0700, Mike Mason wrote:
Michael Ellerman wrote:quoted
On Wed, 2009-07-15 at 14:43 -0700, Mike Mason wrote:quoted
This patch increments the device_node reference counter when an EEH error occurs and decrements the counter when the event has been handled. This is to prevent the device_node from being released until eeh_event_handler() has had a chance to deal with the event. We've seen cases where the device_node is released too soon when an EEH event occurs during a dlpar remove, causing the event handler to attempt to access bad memory locations. Please review and let me know of any concerns.Taking a reference sounds sane, but ...quoted
Signed-off-by: Mike Mason <redacted>--- a/arch/powerpc/platforms/pseries/eeh_event.c 2008-10-09 15:13:53.000000000 -0700 +++ b/arch/powerpc/platforms/pseries/eeh_event.c 2009-07-14 14:14:00.000000000 -0700@@ -75,6 +75,14 @@ static int eeh_event_handler(void * dumm if (event == NULL) return 0; + /* EEH holds a reference to the device_node, so if it + * equals 1 it's no longer valid and the event should + * be ignored */ + if (atomic_read(&event->dn->kref.refcount) == 1) { + of_node_put(event->dn); + return 0; + }That's really gross :)Agreed. I'll look for another way to determine if device is gone and the event should be ignored. Suggestions are welcome :-)
Benh and I had a quick chat about it, and were wondering whether what you really should be doing is taking a reference to the pci device (perhaps as well as the device node).
@@ -140,7 +149,7 @@ int eeh_send_failure_event (struct devic if (dev) pci_dev_get(dev); - event->dn = dn; + event->dn = of_node_get(dn); event->dev = dev;
pci devs are refcounted too, see pci_dev_get(), so taking a reference there would be the "right" thing to do - otherwise there's no guarantee it still exists later, unless there's some other trick in the EEH code. Taking a reference would presumably block a concurrent hotunplug until you'd processed the EEH event and dropped your reference. That might be OK, or you could add a hotplug notifier to the EEH code and drop the reference there and mark the event as handled or something. All of that with the caveat that I don't really know the EEH or hotplug code :D cheers
Attachments
- signature.asc [application/pgp-signature] 197 bytes