Re: [PATCH v3 1/2] x86/setup: don't remove E820_TYPE_RAM for pfn 0
From: Mike Rapoport <rppt@kernel.org>
Date: 2021-01-13 15:36:27
Also in:
lkml
On Wed, Jan 13, 2021 at 01:56:45PM +0100, David Hildenbrand wrote:
On 11.01.21 20:40, Mike Rapoport wrote:quoted
From: Mike Rapoport <redacted> The first 4Kb of memory is a BIOS owned area and to avoid its allocation for the kernel it was not listed in e820 tables as memory. As the result, pfn 0 was never recognised by the generic memory management and it is not a part of neither node 0 nor ZONE_DMA. If set_pfnblock_flags_mask() would be ever called for the pageblock corresponding to the first 2Mbytes of memory, having pfn 0 outside of ZONE_DMA would trigger VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page); Along with reserving the first 4Kb in e820 tables, several first pages are reserved with memblock in several places during setup_arch(). These reservations are enough to ensure the kernel does not touch the BIOS area and it is not necessary to remove E820_TYPE_RAM for pfn 0. Remove the update of e820 table that changes the type of pfn 0 and move the comment describing why it was done to trim_low_memory_range() that reserves the beginning of the memory. Signed-off-by: Mike Rapoport <redacted> --- arch/x86/kernel/setup.c | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-)diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 740f3bdb3f61..3412c4595efd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c@@ -660,17 +660,6 @@ static void __init trim_platform_memory_ranges(void) static void __init trim_bios_range(void) { - /* - * A special case is the first 4Kb of memory; - * This is a BIOS owned area, not kernel ram, but generally - * not listed as such in the E820 table. - * - * This typically reserves additional memory (64KiB by default) - * since some BIOSes are known to corrupt low memory. See the - * Kconfig help text for X86_RESERVE_LOW. - */ - e820__range_update(0, PAGE_SIZE, E820_TYPE_RAM, E820_TYPE_RESERVED); - /* * special case: Some BIOSes report the PC BIOS * area (640Kb -> 1Mb) as RAM even though it is not.@@ -728,6 +717,15 @@ early_param("reservelow", parse_reservelow); static void __init trim_low_memory_range(void) { + /* + * A special case is the first 4Kb of memory; + * This is a BIOS owned area, not kernel ram, but generally + * not listed as such in the E820 table. + * + * This typically reserves additional memory (64KiB by default) + * since some BIOSes are known to corrupt low memory. See the + * Kconfig help text for X86_RESERVE_LOW. + */ memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); }The only somewhat-confusing thing is that in-between e820__memblock_setup() and trim_low_memory_range(), we already have memblock allocations. So [0..4095] might look like ordinary memory until we reserve it later on. E.g., reserve_real_mode() does a mem = memblock_find_in_range(0, 1<<20, size, PAGE_SIZE); ... memblock_reserve(mem, size); set_real_mode_mem(mem); which looks kind of suspicious to me. Most probably I am missing something, just wanted to point that out. We might want to do such trimming/adjustments before any kind of allocations.
You are right and it looks suspicious, but the first page is reserved at the very beginning of x86::setup_arch() and, moreover, memblock never allocates it (look at memblock::memblock_find_in_range_node()). As for the range 0x1000 <-> reserve_low, we are unlikely to allocate it in the default top-down mode. The bottom-up mode was only allocating memory above the kernel so this would also prevent allocation of the lowest memory, at least until the recent changes for CMA allocation: https://lore.kernel.org/lkml/20201217201214.3414100-1-guro@fb.com (local) That said, we'd better consolidate all the trim_some_memory() and move it closer to the beginning of setup_arch(). I'm going to take a look at it in the next few days.
-- Thanks, David / dhildenb
-- Sincerely yours, Mike.