[PATCH 08/16] arm64/kexec: Add core kexec support

From: geoff@infradead.org (Geoff Levand)
Date: 2015-11-03 00:30:16
Also in: kexec

Hi James,

On Fri, 2015-10-30 at 16:29 +0000, James Morse wrote:

On 20/10/15 00:38, Geoff Levand wrote:

quoted

+config KEXEC
+> > 	> > depends on (!SMP || PM_SLEEP_SMP)

Commit 4b3dc9679cf7 got rid of '!SMP'.

Fixed for v11.

quoted

- * Copyright (C) 2015 Huawei Futurewei Technologies.
+ * Copyright (C) Huawei Futurewei Technologies.

Move this hunk into the patch that adds the file?

Was fixed in v10.2.

quoted

+++ b/arch/arm64/kernel/relocate_kernel.S

If I've followed all this through properly:

With KVM - mmu+caches are configured, but then disabled by 'kvm: allows kvm
cpu hotplug'. This 'arm64_relocate_new_kernel' function then runs at EL2
with M=0, C=0, I=0.

Without KVM - when there is no user of EL2, the mmu+caches are left in
whatever state the bootloader (or efi stub) left them in. From
Documentation/arm64/booting.txt:

quoted

Instruction cache may be on or off.

and

quoted

System caches which respect the architected cache maintenance by VA
operations must be configured and may be enabled.

So 'arm64_relocate_new_kernel' function could run at EL2 with M=0, C=?, I=?.

I think this means you can't guarantee anything you are copying below
actually makes it through the caches - booting secondary processors may get
stale values.

The EFI stub disables the M and C bits when booted at EL2 with uefi - but
it leaves the instruction cache enabled. You only clean the
reboot_code_buffer from the data cache, so there may be stale values in the
instruction cache.

I think you need to disable the i-cache at EL1. If you jump to EL2, I think
you need to disable the I/C bits there too - as you can't rely on the code
in 'kvm: allows kvm cpu hotplug' to do this in a non-kvm case.

For consistency across all code paths, we could put in something like this:

+       /* Clear SCTLR_ELx_FLAGS. */
+       mrs     x0, CurrentEL
+       cmp     x0, #CurrentEL_EL2
+       b.ne    1f
+       mrs     x0, sctlr_el2
+       ldr     x1, =SCTLR_EL2_FLAGS
+       bic     x0, x0, x1
+       msr     sctlr_el2, x0
+       isb
+       b       2f
+1:     mrs     x0, sctlr_el1
+       ldr     x1, =SCTLR_EL2_FLAGS
+       bic     x0, x0, x1
+       msr     sctlr_el1, x0
+       isb

quoted

+.Ldone:
+> > 	> > dsb> > 	> > sy
+> > 	> > isb
+> > 	> > ic> > 	> > ialluis
+> > 	> > dsb> > 	> > sy

Why the second dsb?

I removed the first isb as Mark suggested.

quoted

+> > 	> > isb
+
+> > 	> > /* Start new image. */
+> > 	> > ldr> > 	> > x4, .Lkimage_start
+> > 	> > mov> > 	> > x0, xzr
+> > 	> > mov> > 	> > x1, xzr
+> > 	> > mov> > 	> > x2, xzr
+> > 	> > mov> > 	> > x3, xzr

Once the kexec'd kernel is booting, I get:

quoted

WARNING: x1-x3 nonzero in violation of boot protocol:
        x1: 0000000080008000
        x2: 0000000000000020
        x3: 0000000000000020
This indicates a broken bootloader or old kernel

Presumably this 'kimage_start' isn't pointing to the new kernel, but the
purgatory code, (which comes from user-space?). (If so what are these xzr-s
for?)

The warning was from the arm64 purgatory in kexec-tools, now fixed.

We don't need to zero the registers anymore.   At one time I had
an option where the kernel found the dtb section and jumped
directly to the new image as the 32 bit arm kernel does.

+/* The machine_kexec routine sets these variables via offsets from

quoted

+ * arm64_relocate_new_kernel.
+ */
+
+/*
+ * .Lkimage_start - Copy of image->start, the entry point of the new
+ * image.
+ */
+.Lkimage_start:
+> > 	> > .quad> > 	> > 0x0
+
+/*
+ * .Lkimage_head - Copy of image->head, the list of kimage entries.
+ */
+.Lkimage_head:
+> > 	> > .quad> > 	> > 0x0
+

I assume these .quad-s are used because you can't pass the values in via
registers - due to the complicated soft_restart(). Given you are the only
user, couldn't you simplify it to do all the disabling in
arm64_relocate_new_kernel?

I moved some things from cpu_reset to arm64_relocate_new_kernel, but
from what Takahiro has said, to support a modular kvm some of the CPU
shutdown code will be shared.  Maybe we can look into simplifying things
once work on modular kvm is started.

From 'kexec -e' to the first messages from the new kernel takes ~1 minute
on Juno, Did you see a similar delay? Or should I go looking for what I've
configured wrong!?

As Pratyush has mentioned this is most likely due to the dcaches
being disabled.

(Copying code with the mmu+caches on, then cleaning to PoC was noticeably
faster for hibernate)


I've used this series for kexec-ing between 4K and 64K page_size kernels on
Juno.

Thanks for testing.

-Geoff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help