Re: [PATCH] powerpc: on crash, kexec'ed kernel needs all CPUs are online
From: Hari Bathini <hidden>
Date: 2015-11-05 10:24:39
On 11/05/2015 07:02 AM, David Gibson wrote:
On Wed, 4 Nov 2015 14:54:51 +0100 Laurent Vivier [off-list ref] wrote:quoted
On 04/11/2015 13:34, Hari Bathini wrote:quoted
On 10/16/2015 12:30 AM, Laurent Vivier wrote:quoted
On kexec, all secondary offline CPUs are onlined before starting the new kernel, this is not done in the case of kdump. If kdump is configured and a kernel crash occurs whereas some secondaries CPUs are offline (SMT=off), the new kernel is not able to start them and displays some "Processor X is stuck.". Starting with POWER8, subcore logic relies on all threads of core being booted. So, on startup kernel tries to start all threads, and asks OPAL (or RTAS) to start all CPUs (including threads). If a CPU has been offlined by the previous kernel, it has not been returned to OPAL, and thus OPAL cannot restart it: this CPU has been lost... Signed-off-by: Laurent Vivier<redacted>Hi Laurent,Hi Hari,quoted
Sorry for jumping too late into this.better late than never :)quoted
Are you seeing this issue even with the below patches: pseries: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c1caae3de46a072d0855729aed6e793e536a4a55Unfortunately, this is unlikely to be relevant - this fixes a failure while setting up the kexec. The problem we see occurs once we've booted the second kernel and it's attempting to bring up secondary CPUs.quoted
quoted
opal/powernv: https://github.com/open-power/skiboot/commit/9ee56b5Very interesting. Is there a way to have a firmware with the fix ?From Laurent's analysis of the crash, I don't think this will be
relevant either, but I'm not sure. It would be very interesting to know which (if any) released firmwares include this patch so we can test it.
Hi Laurent/David, I am not so sure on this. While I get back on this, can you confirm you are seeing the issue in both PowerVM (pseries) and baremetal (powernv). What is the kernel version where the issue is seen for PowerVM and/or baremetal. Also, for baremetal, can you mention the OPAL version on which the issue is reproducible. If a bug is raised for this, I would be happy to be pointed to, to get more information on this. Thanks Hari
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev