[PATCH 08/16] arm64/kexec: Add core kexec support
From: geoff@infradead.org (Geoff Levand)
Date: 2015-11-03 00:30:16
Also in:
kexec
Hi James, On Fri, 2015-10-30 at 16:29 +0000, James Morse wrote:
On 20/10/15 00:38, Geoff Levand wrote:quoted
+config KEXEC +> > > > depends on (!SMP || PM_SLEEP_SMP)Commit 4b3dc9679cf7 got rid of '!SMP'.
Fixed for v11.
quoted
- * Copyright (C) 2015 Huawei Futurewei Technologies. + * Copyright (C) Huawei Futurewei Technologies.Move this hunk into the patch that adds the file?
Was fixed in v10.2.
quoted
+++ b/arch/arm64/kernel/relocate_kernel.S
If I've followed all this through properly: With KVM - mmu+caches are configured, but then disabled by 'kvm: allows kvm cpu hotplug'. This 'arm64_relocate_new_kernel' function then runs at EL2 with M=0, C=0, I=0. Without KVM - when there is no user of EL2, the mmu+caches are left in whatever state the bootloader (or efi stub) left them in. From Documentation/arm64/booting.txt:quoted
Instruction cache may be on or off.andquoted
System caches which respect the architected cache maintenance by VA operations must be configured and may be enabled.So 'arm64_relocate_new_kernel' function could run at EL2 with M=0, C=?, I=?. I think this means you can't guarantee anything you are copying below actually makes it through the caches - booting secondary processors may get stale values. The EFI stub disables the M and C bits when booted at EL2 with uefi - but it leaves the instruction cache enabled. You only clean the reboot_code_buffer from the data cache, so there may be stale values in the instruction cache. I think you need to disable the i-cache at EL1. If you jump to EL2, I think you need to disable the I/C bits there too - as you can't rely on the code in 'kvm: allows kvm cpu hotplug' to do this in a non-kvm case.
For consistency across all code paths, we could put in something like this: + /* Clear SCTLR_ELx_FLAGS. */ + mrs x0, CurrentEL + cmp x0, #CurrentEL_EL2 + b.ne 1f + mrs x0, sctlr_el2 + ldr x1, =SCTLR_EL2_FLAGS + bic x0, x0, x1 + msr sctlr_el2, x0 + isb + b 2f +1: mrs x0, sctlr_el1 + ldr x1, =SCTLR_EL2_FLAGS + bic x0, x0, x1 + msr sctlr_el1, x0 + isb
quoted
+.Ldone: +> > > > dsb> > > > sy +> > > > isb +> > > > ic> > > > ialluis +> > > > dsb> > > > syWhy the second dsb?
I removed the first isb as Mark suggested.
quoted
+> > > > isb + +> > > > /* Start new image. */ +> > > > ldr> > > > x4, .Lkimage_start +> > > > mov> > > > x0, xzr +> > > > mov> > > > x1, xzr +> > > > mov> > > > x2, xzr +> > > > mov> > > > x3, xzrOnce the kexec'd kernel is booting, I get:quoted
WARNING: x1-x3 nonzero in violation of boot protocol: x1: 0000000080008000 x2: 0000000000000020 x3: 0000000000000020 This indicates a broken bootloader or old kernelPresumably this 'kimage_start' isn't pointing to the new kernel, but the purgatory code, (which comes from user-space?). (If so what are these xzr-s for?)
The warning was from the arm64 purgatory in kexec-tools, now fixed. We don't need to zero the registers anymore. At one time I had an option where the kernel found the dtb section and jumped directly to the new image as the 32 bit arm kernel does.
+/* The machine_kexec routine sets these variables via offsets fromquoted
+ * arm64_relocate_new_kernel. + */ + +/* + * .Lkimage_start - Copy of image->start, the entry point of the new + * image. + */ +.Lkimage_start: +> > > > .quad> > > > 0x0 + +/* + * .Lkimage_head - Copy of image->head, the list of kimage entries. + */ +.Lkimage_head: +> > > > .quad> > > > 0x0 +I assume these .quad-s are used because you can't pass the values in via registers - due to the complicated soft_restart(). Given you are the only user, couldn't you simplify it to do all the disabling in arm64_relocate_new_kernel?
I moved some things from cpu_reset to arm64_relocate_new_kernel, but from what Takahiro has said, to support a modular kvm some of the CPU shutdown code will be shared. Maybe we can look into simplifying things once work on modular kvm is started.
From 'kexec -e' to the first messages from the new kernel takes ~1 minute on Juno, Did you see a similar delay? Or should I go looking for what I've configured wrong!?
As Pratyush has mentioned this is most likely due to the dcaches being disabled.
(Copying code with the mmu+caches on, then cleaning to PoC was noticeably faster for hibernate) I've used this series for kexec-ing between 4K and 64K page_size kernels on Juno.
Thanks for testing. -Geoff