Re: linux-next: PowerPC boot failures in next-20120521

From: David Rientjes <rientjes@google.com>
Date: 2012-05-22 02:25:06
Also in: linux-next, lkml

On Tue, 22 May 2012, Michael Neuling wrote:

console [tty0] enabled
console [hvc0] enabled
pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
Unable to handle kernel paging request for data at address 0x00001388
Faulting instruction address: 0xc00000000014a070
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c00000000014a070 LR: c0000000001978cc CTR: c0000000000b6870
REGS: c00000007e5836b0 TRAP: 0300   Tainted: G        W     (3.4.0-rc6-mikey)
MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI>  CR: 28004022  XER: 02000000
SOFTE: 1
CFAR: 00000000000050fc
DAR: 0000000000001388, DSISR: 40000000
TASK = c00000007e560000[1] 'swapper/0' THREAD: c00000007e580000 CPU: 0
GPR00: 0000000000000000 c00000007e583930 c000000000c034d8 00000000000012d0 
GPR04: 0000000000000000 0000000000001380 0000000000000000 0000000000000001 
GPR08: c00000007e0dff60 0000000000000000 c000000000ca05a0 0000000000000000 
GPR12: 0000000028004024 c00000000ff20000 0000000000000000 0000000000000000 
GPR16: 0000000000000000 0000000000000000 0000000000000001 0000000000001380 
GPR20: 0000000000000001 c000000000e14900 c000000000e148f0 0000000000000001 
GPR24: c000000000c6f378 0000000000000000 0000000000001380 00000000000002aa 
GPR28: 0000000000000000 0000000000000000 c000000000b576b0 c00000007e021200 
NIP [c00000000014a070] .__alloc_pages_nodemask+0xd0/0x910
LR [c0000000001978cc] .new_slab+0xcc/0x3d0
Call Trace:
[c00000007e583930] [c00000007e5839c0] 0xc00000007e5839c0 (unreliable)
[c00000007e583ac0] [c0000000001978cc] .new_slab+0xcc/0x3d0
[c00000007e583b70] [c00000000072ae98] .__slab_alloc+0x38c/0x4f8
[c00000007e583cb0] [c000000000198190] .kmem_cache_alloc_node_trace+0x90/0x260
[c00000007e583d60] [c000000000a5a404] .numa_init+0x9c/0x188
[c00000007e583e00] [c00000000000aa30] .do_one_initcall+0x60/0x1e0
[c00000007e583ec0] [c000000000a40b60] .kernel_init+0x128/0x294
[c00000007e583f90] [c000000000020788] .kernel_thread+0x54/0x70
Instruction dump:
0b000000 eb1e8000 3b800000 801800a8 2f800000 409e001c 7860efe3 38000000 
41820008 38000002 787c6fe2 7f9c0378 <e93a0008> 801800a4 3b600000 2fa90000 
---[ end trace 31fd0ba7d8756002 ]---

Which seems to be this code in __alloc_pages_nodemask
---
        /*
         * Check the zones suitable for the gfp_mask contain at least one
         * valid zone. It's possible to have an empty zonelist as a result
         * of GFP_THISNODE and a memoryless node
         */
        if (unlikely(!zonelist->_zonerefs->zone))
c00000000014a070:       e9 3a 00 08     ld      r9,8(r26)
---

r26 is coming from r5 which is the struct zonelist *zonelist parameter
to __alloc_pages_nodemask.  Having 0000000000001380 in there is clearly
a bogus pointer.

Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
  b4cdf91 sched/numa: Implement numa balancer

Trying David's patch just posted doesn't fix it.

Hmm, what does CONFIG_DEBUG_VM say?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help