Re: linux-next: PowerPC boot failures in next-20120521
From: David Rientjes <rientjes@google.com>
Date: 2012-05-22 02:25:06
Also in:
linux-next, lkml
On Tue, 22 May 2012, Michael Neuling wrote:
console [tty0] enabled
console [hvc0] enabled
pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
Unable to handle kernel paging request for data at address 0x00001388
Faulting instruction address: 0xc00000000014a070
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
REGS: c00000007e5836b0 TRAP: 0300 Tainted: G W (3.4.0-rc6-mikey)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 28004022 XER: 02000000
SOFTE: 1
CFAR: 00000000000050fc
DAR: 0000000000001388, DSISR: 40000000
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0
GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001
GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000
GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001
GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa
GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200
NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
LR [c0000000001978cc] .new_slab+0xcc/0x3d0
Call Trace:
[c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
[c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
[c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
[c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
[c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000
41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000
---[ end trace 31fd0ba7d8756002 ]---
Which seems to be this code in __alloc_pages_nodemask
---
/*
* Check the zones suitable for the gfp_mask contain at least one
* valid zone. It's possible to have an empty zonelist as a result
* of GFP_THISNODE and a memoryless node
*/
if (unlikely(!zonelist->_zonerefs->zone))
c00000000014a070: e9 3a 00 08 ld r9,8(r26)
---
r26 is coming from r5 which is the struct zonelist *zonelist parameter
to __alloc_pages_nodemask. Having 0000000000001380 in there is clearly
a bogus pointer.
Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
b4cdf91 sched/numa: Implement numa balancer
Trying David's patch just posted doesn't fix it.Hmm, what does CONFIG_DEBUG_VM say?