Thread (83 messages) 83 messages, 3 authors, 2013-02-05

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

From: Greg KH <gregkh@linuxfoundation.org>
Date: 2013-02-04 12:45:59
Also in: linux-acpi, linux-mm, linux-s390, lkml

On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
quoted
Yes, but those are just remove events and we can only see how destructive they
were after the removal.  The point is to be able to figure out whether or not
we *want* to do the removal in the first place.

Say you have a computing node which signals a hardware problem in a processor
package (the container with CPU cores, memory, PCI host bridge etc.).  You
may want to eject that package, but you don't want to kill the system this
way.  So if the eject is doable, it is very much desirable to do it, but if it
is not doable, you'd rather shut the box down and do the replacement afterward.
That may be costly, however (maybe weeks of computations), so it should be
avoided if possible, but not at the expense of crashing the box if the eject
doesn't work out.
It seems to me that we could handle that with the help of a new flag, say
"no_eject", in struct device, a global mutex, and a function that will walk
the given subtree of the device hierarchy and check if "no_eject" is set for
any devices in there.  Plus a global "no_eject" switch, perhaps.
I think this will always be racy, or at worst, slow things down on
normal device operations as you will always be having to grab this flag
whenever you want to do something new.

See my comments earlier about pci hotplug and the design decisions there
about "no eject" capabilities for why.

thanks,

greg k-h
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help