Thread (5 messages) 5 messages, 3 authors, 2010-07-30

Re: [PATCH v2] powerpc/kexec: Fix orphaned offline CPUs across kexec

From: Michael Neuling <hidden>
Date: 2010-07-30 03:15:14
Also in: kexec

(adding kexec list to CC)

In message [ref] you wrote:
Michael Neuling wrote:
quoted
In message [ref] you wrote:
quoted
When CPU hotplug is used, some CPUs may be offline at the time a kexec is
performed.  The subsequent kernel may expect these CPUs to be already runn
ing
quoted
,
quoted
and will declare them stuck.  On pseries, there's also a soft-offline (ced
e)
quoted
quoted
state that CPUs may be in; this can also cause problems as the kexeced ker
nel
quoted
quoted
may ask RTAS if they're online -- and RTAS would say they are.  Again, stu
ck.
quoted
quoted
This patch kicks each present offline CPU awake before the kexec, so that
none are lost to these assumptions in the subsequent kernel.
There are a lot of cleanups in this patch.  The change you are making
would be a lot clearer without all the additional cleanups in there.  I
think I'd like to see this as two patches.  One for cleanups and one for
the addition of wake_offline_cpus().
Okay, I can split this.  Typofixy-add-debug in one, wake_offline_cpus
in another. 
Thanks.
quoted
Other than that, I'm not completely convinced this is the functionality
we want.  Do we really want to online these cpus?  Why where they
offlined in the first place?  I understand the stuck problem, but is the
solution to online them, or to change the device tree so that the second
kernel doesn't detect them as stuck?
Well... There are two cases.  If a CPU is soft-offlined on pseries, it
must b e woken from that cede loop (in
platforms/pseries/hotplug-cpu.c) as we're repla cing code under its
feet.  We could either special-case the wakeup from this ce de loop to
get that CPU to RTAS "stop-self" itself properly.  (Kind of like a "
wake to die".)
Makes sense.  
So that leaves hard-offline CPUs (perhaps including the above): I
don't know why they might have been offlined.  If it's something
serious, like fire, they'd be removed from the present set too (and
thus not be considered in this restarting case).  We could add a mask
to the CPU node to show which of the threads (if any) are running, and
alter the startup code to start everything if this mask doesn't exist
(non-kexec) or only online currently-running threads if the mask is
present.  That feels a little weird.

My reasoning for restarting everything was: The first time you boot,
all of your present CPUs are started up.  When you reboot, any CPUs
you offlined for fun are restarted.  Kexec is (in this non-crash
sense) a user-initiated 'quick reboot', so I reasoned that it should
look the same as a 'hard reboot' and your new invocation would have
all available CPUs running as is usual.
OK, I like this justification.  Would be good to include it in the
checkin comment since we're changing functionality somewhat.

Mikey
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help