Thread (8 messages) 8 messages, 2 authors, 2021-07-31

Re: Regression bisected to fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather than zone sizes)

From: Mike Rapoport <hidden>
Date: 2021-07-27 06:44:05
Also in: linux-mm, lkml
Subsystem: alpha port, the rest · Maintainers: Richard Henderson, Matt Turner, Magnus Lindholm, Linus Torvalds

On Mon, Jul 26, 2021 at 02:23:20PM -0700, Matt Turner wrote:
On Mon, Jul 26, 2021 at 1:06 PM Mike Rapoport [off-list ref] wrote:
quoted
Hi Matt,

On Mon, Jul 26, 2021 at 12:27:50PM -0700, Matt Turner wrote:
quoted
Reply-To:

Hi Mike!

Since commit fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather
than zone sizes), I get the following BUG on Alpha (an AlphaServer ES47 Marvel)
and loading userspace leads to a segfault:

(I didn't notice this for a long time because of other unrelated regressions,
the pandemic, changing jobs, ...)
I suspect there will be more surprises down the road :)
quoted
BUG: Bad page state in process swapper  pfn:2ffc53
page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
flags: 0x0()
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
page dumped because: nonzero mapcount  Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26
       fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0
       fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0
       0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
       fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80
       fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80
       fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
Trace:
[<fffffc00011cd148>] bad_page+0x168/0x1b0
[<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
[<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc000101001c>] _stext+0x1c/0x20

I haven't tried reproducing this on other machines or QEMU, but I'd be glad to
if that helps.
If it's reproducible on QEMU I can debug it locally.
quoted
Any ideas?
It seems like memory map is not properly initialized. Can you enable
CONFIG_DEBUG_MEMORY_INIT and add mminit_debug=4 to the command line. The
interesting part of the log would be before "Memory: xK/yK available ..."
line.

Hopefully it'll give some clues.
Sure thing. Please find attached.
aboot: loading uncompressed vmlinuz-5.7.0-03841-gfa3354e4ea39-dirty...
aboot: loading compressed vmlinuz-5.7.0-03841-gfa3354e4ea39-dirty...
aboot: PHDR 0 vaddr 0xfffffc0001010000 offset 0xc0 size 0x17c5ae0
aboot: bss at 0xfffffc00027d5ae0, size 0xe4ea0
aboot: zero-filling 937632 bytes at 0xfffffc00027d5ae0
aboot: loading initrd (5965252 bytes/5825 blocks) at 0xfffffc05ff2cc000
aboot: starting kernel vmlinuz-5.7.0-03841-gfa3354e4ea39-dirty with arguments ro panic=5 domdadm root=/dev/md1 console=srm  mminit_debug=4
Linux version 5.7.0-03841-gfa3354e4ea39-dirty (mattst88@ivybridge) (gcc version 11.1.0 (Gentoo 11.1.0-r2 p3), GNU ld (Gentoo 2.36.1 p3) 2.36.1) #26 SMP Sun Jul 25 18:20:06 PDT 2021
printk: bootconsole [srm0] enabled
Booting on Marvel variation Marvel/EV7 using machine vector MARVEL/EV7 from SRM
Major Options: SMP EV67 VERBOSE_MCHECK DEBUG_SPINLOCK MAGIC_SYSRQ 
Command line: ro panic=5 domdadm root=/dev/md1 console=srm  mminit_debug=4
memcluster 0, usage 1, start        0, end     1984
memcluster 1, usage 0, start     1984, end  1048576
memcluster 2, usage 1, start  2097152, end  2097224
memcluster 3, usage 0, start  2097224, end  3145728
Initial ramdisk at: 0x(____ptrval____) (5965252 bytes)
Found an IO7 at PID 0
Initializing IO7 at PID 0
FIXME: disabling master aborts
FIXME: disabling master aborts
FIXME: disabling master aborts
FIXME: disabling master aborts
SMP: 2 CPUs probed -- cpu_present_mask = 3
Zone ranges:
  DMA      [mem 0x0000000000f80000-0x00000fffffffdfff]
  Normal   empty
Movable zone start for each node
Early memory node ranges
  node   0: [mem 0x0000000000f80000-0x00000001ffffffff]
  node   0: [mem 0x0000000400090000-0x00000005ffffffff]
I think that the issue is that memory marked as used in memcluster is never
added to memblock and it skews node/zone sizing calculations.

Can you try this patch:
diff --git a/arch/alpha/kernel/setup.c b/arch/alpha/kernel/setup.c
index 7d56c217b235..b4fbbba30aa2 100644
--- a/arch/alpha/kernel/setup.c
+++ b/arch/alpha/kernel/setup.c
@@ -319,18 +319,19 @@ setup_memory(void *kernel_end)
 		       i, cluster->usage, cluster->start_pfn,
 		       cluster->start_pfn + cluster->numpages);
 
-		/* Bit 0 is console/PALcode reserved.  Bit 1 is
-		   non-volatile memory -- we might want to mark
-		   this for later.  */
-		if (cluster->usage & 3)
-			continue;
-
 		end = cluster->start_pfn + cluster->numpages;
 		if (end > max_low_pfn)
 			max_low_pfn = end;
 
 		memblock_add(PFN_PHYS(cluster->start_pfn),
 			     cluster->numpages << PAGE_SHIFT);
+
+		/* Bit 0 is console/PALcode reserved.  Bit 1 is
+		   non-volatile memory -- we might want to mark
+		   this for later.  */
+		if (cluster->usage & 3)
+			memblock_reserve(PFN_PHYS(cluster->start_pfn),
+				         cluster->numpages << PAGE_SHIFT);
 	}
 
 	/*
Initmem setup node 0 [mem 0x0000000000f80000-0x00000005ffffffff]
percpu: Embedded 8 pages/cpu s27648 r8192 d29696 u65536
Built 1 zonelists, mobility grouping on.  Total pages: 2070535
Kernel command line: ro panic=5 domdadm root=/dev/md1 console=srm  mminit_debug=4
Dentry cache hash table entries: 2097152 (order: 11, 16777216 bytes, linear)
Inode-cache hash table entries: 1048576 (order: 10, 8388608 bytes, linear)
Sorting __ex_table...
mem auto-init: stack:off, heap alloc:off, heap free:off
BUG: Bad page state in process swapper  pfn:2ffc3f
page:fffffc000ecf0fc0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0
flags: 0x0()
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
page dumped because: nonzero mapcount
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26
       fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf0fc0
       fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf0fc0
       0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618
       fffffc00027da7d0 00000000002ff966 0000000000000000 fffffc0001b5be80
       fffffc00011d1abc fffffc000ecf0fc0 fffffc0002d00000 fffffc0001b5be80
       fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0
Trace:
[<fffffc00011cd148>] bad_page+0x168/0x1b0
[<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290
[<fffffc00011d1abc>] free_unref_page+0x2c/0xa0
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30
[<fffffc000101001c>] _stext+0x1c/0x20
...
Memory: 16496504K/16760768K available (8698K kernel code, 12790K rwdata, 2544K rodata, 304K init, 915K bss, 256576K reserved, 0K cma-reserved)

-- 
Sincerely yours,
Mike.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help