Re: [PATCH 0/2] arm64: kexec_file_load vs memory reservations
From: Marc Zyngier <maz@kernel.org>
Date: 2021-06-02 16:02:41
Also in:
kexec
Hi James, On Wed, 02 Jun 2021 15:22:00 +0100, James Morse [off-list ref] wrote:
Hi Marc, On 29/04/2021 14:35, Marc Zyngier wrote:quoted
It recently became apparent that using kexec with kexec_file_load() on arm64 is pretty similar to playing Russian roulette. Depending on the amount of memory, the HW supported and the firmware interface used, your secondary kernel may overwrite critical memory regions without which the secondary kernel cannot boot (the GICv3 LPI tables being a prime example of such reserved regions). It turns out that there is at least two ways for reserved memory regions to be described to kexec: /proc/iomem for the userspace implementation, and memblock.reserved for kexec_file.One is spilled into the other by request_standard_resources()...quoted
And of course, our LPI tables are only reserved using the resource tree, leading to the aforementioned stamping.Presumably well after efi_init() has run...
Yup, much later. And we can keep on reserving memory as long as we boot new CPUs. Having it as a one-off sync doesn't really help here.
quoted
Similar things could happen with ACPI tables as well.efi_init() calls reserve_regions(), which has: | /* keep ACPI reclaim memory intact for kexec etc. */ | if (md->type == EFI_ACPI_RECLAIM_MEMORY) | memblock_reserve(paddr, size); This is also what stops mm from allocating them, as memblock-reserved gets copied into the PG_Reserved flag by free_low_memory_core_early()'s calls to reserve_bootmem_region(). Is your machines firmware putting them in a region with a different type?
Good question. Moritz (cc'd) saw the tables being overwritten on his system (which I don't have access to), so I guess this is not entirely clear cut how this happens. My SQ box reports the ACPI region as "ACPI Reclaim", so I guess it works as expected here.
(The UEFI spec has something to say: see 2.3.6 "AArch64 Platforms": | ACPI Tables loaded at boot time can be contained in memory of type EfiACPIReclaimMemory | (recommended) or EfiACPIMemoryNVS NVS would fail the is_usable_memory() check earlier, so gets treated as nomap)
Note that I've since changed tactics and proposed that we fully rely on the resource tree instead[1]. Thanks, M. [1] https://lore.kernel.org/r/20210531095720.77469-1-maz@kernel.org (local) -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel