Thread (15 messages) 15 messages, 4 authors, 2014-06-12

Re: [PATCH] powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode

From: Vivek Goyal <vgoyal@redhat.com>
Date: 2014-06-06 18:27:58
Also in: kexec, lkml

On Fri, Jun 06, 2014 at 06:00:43PM +0530, Srivatsa S. Bhat wrote:
On 06/04/2014 07:16 PM, Vivek Goyal wrote:
quoted
On Wed, Jun 04, 2014 at 08:09:25AM +1000, Benjamin Herrenschmidt wrote:
quoted
On Wed, 2014-06-04 at 01:58 +0530, Srivatsa S. Bhat wrote:
quoted
Yep, that makes sense. But unfortunately I don't have enough insight into
why exactly powerpc has to online the CPUs before doing a kexec. I just
know from the commit log and the comment mentioned above (and from my own
experiments) that the CPUs will get stuck if they were offline. Perhaps
somebody more knowledgeable can explain this in detail and suggest a proper
long-term solution.

Matt, Ben, any thoughts on this?
The problem is with our "soft offline" which we do on some platforms. When we
offline we don't actually send the CPUs back to firmware or anything like that.

We put them into a very low low power loop inside Linux.

The new kernel has no way to extract them from that loop. So we must re-"online"
them before we kexec so they can be passed to the new kernel normally (or returned
to firmware like we do on powernv).
Srivatsa,

Looks like your patch has been merged.

I don't like the following change in arch independent code.

/*
 * migrate_to_reboot_cpu() disables CPU hotplug assuming  that
 * no further code needs to use CPU hotplug (which is true in
 * the reboot case). However, the kexec path depends on  using
 * CPU hotplug again; so re-enable it here. 
 */
               cpu_hotplug_enable();

As it is very powerpc specific requirement, can you enable hotplug in powerpc
arch dependent code as a short term solution.
I didn't do that because that would mean that the _disable() would be
performed inside kernel/kexec.c and the corresponding _enable() would
be performed in arch/powerpc/kernel/machine_kexec_64.c -- with no apparent
connection between them, which would have made them hard to relate.
Which we are doing anyway. The difference is that now we are doing it
for all arches.

If this is powerpc specific requirement, then we should limit this to
powerpc only and not let spill over in generic code.

And putting a big fat comment should take care of being able to figure
out why arch code is overwriting the generic code's decision. By putting
it in generic code and enforcing this on all arches does not buy us
anything, IMHO.

quoted
Ideally one needs to fix the requirement of online all cpus in powerpc
as a long term solution and then get rid of hotplug enable call.
Yes, I agree. I'm trying out a solution at the moment (see the 4
preliminary patches I sent in my reply to Ben). If that works, we won't
need the enable call on powerpc.
Thanks. This will help.

Thanks
Vivek
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help