A crash on ARM64 in move_freepages_block due to uninitialized pages in reserved memory
From: james.morse@arm.com (James Morse)
Date: 2018-08-23 14:06:13
Also in:
linux-mm
Subsystem:
memory management - core, memory management - mglru (multi-gen lru), the rest · Maintainers:
Andrew Morton, David Hildenbrand, Linus Torvalds
Hi Mikulas, On 23/08/18 12:02, Mikulas Patocka wrote:
On Tue, 21 Aug 2018, James Morse wrote:quoted
On 08/21/2018 11:44 AM, Michal Hocko wrote:quoted
On Fri 17-08-18 15:44:27, Mikulas Patocka wrote:quoted
I report this crash on ARM64 on the kernel 4.17.11. The reason is that the function move_freepages_block accesses contiguous runs of pageblock_nr_pages. The ARM64 firmware sets holes of reserved memory there and when move_freepages_block stumbles over this hole, it accesses uninitialized page structures and crashes.Any idea if this is nomap (so a hole in the linear map), or a missing struct page?The page for this hole seems to be filled with 0xff.
This sounds like a memblock:nomap region, it has a struct page, but it hasn't
been initialized.
deferred_init_memmap() won't initialise struct pages for memblock:nomap pages as
its for_each_free_mem_range() loops use MEMBLOCK_NONE as the required flags.
pfn_valid() will return false for these nomap pages, so the struct page should
never be accessed.
For the fault you're seeing, move_freepages() is using pfn_valid_within(), but
this is optimised out as you don't have HOLES_IN_ZONE.
This looks like a disconnect between nomap, ARCH_HAS_HOLES_MEMORYMODEL and
HOLES_IN_ZONE.
Arm64 only enables HOLES_IN_ZONE for NUMA systems:
6d526ee26ccd ("arm64: mm: enable CONFIG_HOLES_IN_ZONE for NUMA")
It doesn't look like you can't disable ARCH_HAS_HOLES_MEMORYMODEL or SPARSEMEM
for arm64.
My best-guess is that pfn_valid_within() shouldn't be optimised out if
ARCH_HAS_HOLES_MEMORYMODEL, even if HOLES_IN_ZONE isn't set.
Does something like this solve the problem?:
============================%<============================diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2dc52a..5e27095a15f4 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h@@ -1295,7 +1295,7 @@ void memory_present(int nid, unsigned long start, unsignedlong end); * pfn_valid_within() should be used in this case; we optimise this away * when we have no holes within a MAX_ORDER_NR_PAGES block. */ -#ifdef CONFIG_HOLES_IN_ZONE +#if defined(CONFIG_HOLES_IN_ZONE) || defined(CONFIG_ARCH_HAS_HOLES_MEMORYMODEL) #define pfn_valid_within(pfn) pfn_valid(pfn) #else #define pfn_valid_within(pfn) (1) ============================%<============================
quoted
To test Laura's bounds-of-zone theory [0], could you put some empty space between the nvme and the System RAM? (It sounds like this is a KVM guest). Reducing the amount of memory is probably easiest.This is not KVM - it is real hardware with real PCIe nvme device. I don't have smaller memory stick.
Ah, you mentioned KVM/guests further down, given your nvme is right up against the top of the System RAM I assumed this was a guest!
The board can use u-boot firmware or EFI firmware. The u-boot firmware doesn't put a hole in the memory map and the board has been running with it for several months without a problem.
The EFI firmware puts a hole below 0xc0000000 and I got a crash after two weeks of uptime.
This will be because of UEFI's use of nomap when the EFI memory map describes the memory as having incompatible attributes to the kernel linear-map. (if you boot with efi=debug it will dump the uefi memory map)
I analyzed the assembler: PageBuddy in move_freepages returns false Then we call PageLRU, the macro calls PF_HEAD which is compound_page() compound_page reads page->compound_head, it is 0xffffffffffffffff, so it resturns 0xfffffffffffffffe - and accessing this address causes crash
Thanks! That wasn't straightforward to work out without the vmlinux. Because you see all-ones, even in KVM, it looks like the struct page is being initialized like that deliberately... I haven't found where this might be happening. Thanks, James