[PATCH v29 3/9] arm64: kdump: reserve memory for crash dump kernel
From: AKASHI Takahiro <hidden>
Date: 2017-01-19 09:49:42
Also in:
kexec
On Tue, Jan 17, 2017 at 11:54:42AM +0000, Mark Rutland wrote:
On Tue, Jan 17, 2017 at 05:20:44PM +0900, AKASHI Takahiro wrote:quoted
On Fri, Jan 13, 2017 at 11:39:15AM +0000, Mark Rutland wrote:quoted
Great! I think it would be better to follow the approach of mark_rodata_ro(), rather than opening up set_memory_*(), but otherwise, it looks like it should work.I'm not quite sure what the approach of mark_rodata_ro() means, but I found that using create_mapping_late() may cause two problems: 1) it fails when PTE_CONT bits mismatch between an old and new mmu entry. This can happen, say, if the memory range for crash dump kernel starts in the mid of _continuous_ pages.That should only happen if we try to remap a segment different to what we originally mapped. I was intending that we'd explicitly map the reserved region separately in the boot path, like we do for kernel segments in map_kernel(). We would allow sections and/or CONT entires. Then, in __map_memblock() we'd then skip that range as we do for the linear map alias of the kernel image. That way, we can later use create_mapping_late for that same region, and it should handle sections and/or CONT entries in the exact same way as it does for the kernel image segments in mark_rodata_ro().
I see. Which one do you prefer, yours above or my (second) solution? Either way, they do almost the same thing in terms of mapping.
quoted
2) The control code page, of one-page size, is still written out in machine_kexec() which is called at a crash, and this means that the range must be writable even after kexec_load(), but create_mapping_late() does not handle a case of changing attributes for a single page which is in _section_ mapping. We cannot make single-page mapping for the control page since the address of that page is not determined at the boot time.That is a problem. I'm not sure I follow how set_memory_*() helps here though?quoted
As for (1), we need to call memblock_isolate_range() to make the region an independent one.quoted
Either way, this still leaves us with an RO alias on crashed cores (and potential cache attribute mismatches in future). Do we need to read from the region later,I believe not, but the region must be _writable_ as I mentioned in (2) above. To avoid this issue, we have to move copying the control code page to machine_kexec_prepare() which is called in kexec_load() and so the region is writable anyway then. I want Geoff to affirm that this change is safe. (See my second solution below.)From a quick scan that looks ok.quoted
quoted
or could we unmap it entirely?given the change above, I think we can.
I'm now asking Geoff ...
Great!quoted
Is there any code to re-use especially for unmapping?I don't think we have much code useful for unmapping. We could re-use create_mapping_late for this, passing a set of prot bits that means the entries are invalid (e.g. have a PAGE_KERNEL_INVALID).
Do you really think that we should totally invalidate mmu entries? I guess that, given proper cache & TLB flush operations, RO attribute is good enough for memory consistency, no? (None accesses the region, as I said, except in the case of re-loading crash dump kernel though.)
We'd have to perform the TLB invalidation ourselves, but that shouldn't be too painful.
Do we need to invalidate TLBs not only before but also after changing permission attributes as make_rodata_ro() does? -Takahiro AKASHI
Thanks, Mark.quoted
===8<===diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c index c0fc3d458195..80a52e9aaf73 100644 --- a/arch/arm64/kernel/machine_kexec.c +++ b/arch/arm64/kernel/machine_kexec.c@@ -26,8 +26,6 @@ extern const unsigned char arm64_relocate_new_kernel[]; extern const unsigned long arm64_relocate_new_kernel_size; -static unsigned long kimage_start; - /** * kexec_image_info - For debugging output. */@@ -68,7 +66,7 @@ void machine_kexec_cleanup(struct kimage *kimage) */ int machine_kexec_prepare(struct kimage *kimage) { - kimage_start = kimage->start; + void *reboot_code_buffer; kexec_image_info(kimage);@@ -77,6 +75,21 @@ int machine_kexec_prepare(struct kimage *kimage) return -EBUSY; } + reboot_code_buffer = + phys_to_virt(page_to_phys(kimage->control_code_page)); + + /* + * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use + * after the kernel is shut down. + */ + memcpy(reboot_code_buffer, arm64_relocate_new_kernel, + arm64_relocate_new_kernel_size); + + /* Flush the reboot_code_buffer in preparation for its execution. */ + __flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size); + flush_icache_range((uintptr_t)reboot_code_buffer, + arm64_relocate_new_kernel_size); + return 0; }@@ -147,7 +160,6 @@ static void kexec_segment_flush(const struct kimage *kimage) void machine_kexec(struct kimage *kimage) { phys_addr_t reboot_code_buffer_phys; - void *reboot_code_buffer; /* * New cpus may have become stuck_in_kernel after we loaded the image.@@ -156,7 +168,6 @@ void machine_kexec(struct kimage *kimage) !WARN_ON(kimage == kexec_crash_image)); reboot_code_buffer_phys = page_to_phys(kimage->control_code_page); - reboot_code_buffer = phys_to_virt(reboot_code_buffer_phys); kexec_image_info(kimage);@@ -164,26 +175,12 @@ void machine_kexec(struct kimage *kimage) kimage->control_code_page); pr_debug("%s:%d: reboot_code_buffer_phys: %pa\n", __func__, __LINE__, &reboot_code_buffer_phys); - pr_debug("%s:%d: reboot_code_buffer: %p\n", __func__, __LINE__, - reboot_code_buffer); pr_debug("%s:%d: relocate_new_kernel: %p\n", __func__, __LINE__, arm64_relocate_new_kernel); pr_debug("%s:%d: relocate_new_kernel_size: 0x%lx(%lu) bytes\n", __func__, __LINE__, arm64_relocate_new_kernel_size, arm64_relocate_new_kernel_size); - /* - * Copy arm64_relocate_new_kernel to the reboot_code_buffer for use - * after the kernel is shut down. - */ - memcpy(reboot_code_buffer, arm64_relocate_new_kernel, - arm64_relocate_new_kernel_size); - - /* Flush the reboot_code_buffer in preparation for its execution. */ - __flush_dcache_area(reboot_code_buffer, arm64_relocate_new_kernel_size); - flush_icache_range((uintptr_t)reboot_code_buffer, - arm64_relocate_new_kernel_size); - /* Flush the kimage list and its buffers. */ kexec_list_flush(kimage);@@ -206,7 +203,7 @@ void machine_kexec(struct kimage *kimage) */ cpu_soft_restart(kimage != kexec_crash_image, - reboot_code_buffer_phys, kimage->head, kimage_start, 0); + reboot_code_buffer_phys, kimage->head, kimage->start, 0); BUG(); /* Should never get here. */ }diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 569ec3325bc8..e4cc170edc0c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c@@ -90,6 +90,7 @@ early_param("initrd", early_initrd); static void __init reserve_crashkernel(void) { unsigned long long crash_size, crash_base; + int start_rgn, end_rgn; int ret; ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),@@ -120,6 +121,8 @@ static void __init reserve_crashkernel(void) return; } } + memblock_isolate_range(&memblock.memory, crash_base, crash_size, + &start_rgn, &end_rgn); memblock_reserve(crash_base, crash_size); pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 17243e43184e..b7c75845407a 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c@@ -22,6 +22,8 @@ #include <linux/kernel.h> #include <linux/errno.h> #include <linux/init.h> +#include <linux/ioport.h> +#include <linux/kexec.h> #include <linux/libfdt.h> #include <linux/mman.h> #include <linux/nodemask.h>@@ -817,3 +819,27 @@ int pmd_clear_huge(pmd_t *pmd) pmd_clear(pmd); return 1; } + +#ifdef CONFIG_KEXEC_CORE +void arch_kexec_protect_crashkres(void) +{ + flush_tlb_all(); + + create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start), + resource_size(&crashk_res), PAGE_KERNEL_RO); + + /* flush the TLBs after updating live kernel mappings */ + flush_tlb_all(); +} + +void arch_kexec_unprotect_crashkres(void) +{ + flush_tlb_all(); + + create_mapping_late(crashk_res.start, __phys_to_virt(crashk_res.start), + resource_size(&crashk_res), PAGE_KERNEL); + + /* flush the TLBs after updating live kernel mappings */ + flush_tlb_all(); +} +#endif===>8===