Re: [Bug report] BUG: Kernel NULL pointer dereference at 0x00000069, filemap_release_folio+0x88/0xb0
From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2022-09-30 02:01:35
Also in:
linux-ext4, linux-mm
Matthew Wilcox [off-list ref] writes:
On Tue, Sep 27, 2022 at 09:17:20AM +0800, Zorro Lang wrote:quoted
Hi mm and ppc list, Recently I started to hit a kernel panic [2] rarely on *ppc64le* with *1k blocksize* ext4. It's not easy to reproduce, but still has chance to trigger by loop running generic/048 on ppc64le (not sure all kind of ppc64le can reproduce it). Although I've reported a bug to ext4 [1] (more details refer to [1]), but I only hit it on ppc64le until now, and I'm not sure if it's an ext4 related bug, more likes folio related issue, so I cc mm and ppc mail list, hope to get more reviewing.Argh. This is the wrong way to do it. Please stop using bugzilla. Now there's discussion in two places and there's nowhere to see all of it.quoted
[ 4681.230907] BUG: Kernel NULL pointer dereference at 0x00000069 [ 4681.230922] Faulting instruction address: 0xc00000000068ee0c [ 4681.230929] Oops: Kernel access of bad area, sig: 11 [#1] [ 4681.230934] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [ 4681.230991] CPU: 0 PID: 82 Comm: kswapd0 Kdump: loaded Not tainted 6.0.0-rc6+ #1 [ 4681.230999] NIP: c00000000068ee0c LR: c00000000068f2b8 CTR: 0000000000000000 [ 4681.238525] REGS: c000000006c0b560 TRAP: 0380 Not tainted (6.0.0-rc6+) [ 4681.238532] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 24028242 XER: 00000000 [ 4681.238556] CFAR: c00000000068edf4 IRQMASK: 0 [ 4681.238556] GPR00: c00000000068f2b8 c000000006c0b800 c000000002cf1700 c00c00000042f1c0 [ 4681.238556] GPR04: c000000006c0b860 0000000000000000 0000000000000002 0000000000000000 [ 4681.238556] GPR08: c000000002d404b0 0000000000000000 c00c00000042f1c0 0000000000000000 [ 4681.238556] GPR12: c0000000001cf080 c000000005100000 c000000000194298 c0000001fff9c480 [ 4681.238556] GPR16: c000000048cdb850 0000000000000007 0000000000000000 0000000000000000 [ 4681.238556] GPR20: 0000000000000001 c000000006c0b8f8 c00000000146b9d8 5deadbeef0000100 [ 4681.238556] GPR24: 5deadbeef0000122 c000000048cdb800 c000000006c0bc00 c000000006c0b8e8 [ 4681.238556] GPR28: c000000006c0b860 c00c00000042f1c0 0000000000000009 0000000000000009 [ 4681.238634] NIP [c00000000068ee0c] drop_buffers.constprop.0+0x4c/0x1c0 [ 4681.238643] LR [c00000000068f2b8] try_to_free_buffers+0x128/0x150 [ 4681.238650] Call Trace: [ 4681.238654] [c000000006c0b800] [c000000006c0b880] 0xc000000006c0b880 (unreliable) [ 4681.238663] [c000000006c0b840] [c000000006c0bc00] 0xc000000006c0bc00 [ 4681.238670] [c000000006c0b890] [c000000000498708] filemap_release_folio+0x88/0xb0 [ 4681.238679] [c000000006c0b8b0] [c0000000004c51c0] shrink_active_list+0x490/0x750 [ 4681.238688] [c000000006c0b9b0] [c0000000004c9f88] shrink_lruvec+0x3f8/0x430 [ 4681.238697] [c000000006c0baa0] [c0000000004ca1f4] shrink_node_memcgs+0x234/0x290 [ 4681.238704] [c000000006c0bb10] [c0000000004ca3c4] shrink_node+0x174/0x6b0 [ 4681.238711] [c000000006c0bbc0] [c0000000004cacf0] balance_pgdat+0x3f0/0x970 [ 4681.238718] [c000000006c0bd20] [c0000000004cb440] kswapd+0x1d0/0x450 [ 4681.238726] [c000000006c0bdc0] [c0000000001943d8] kthread+0x148/0x150 [ 4681.238735] [c000000006c0be10] [c00000000000cbe4] ret_from_kernel_thread+0x5c/0x64 [ 4681.238745] Instruction dump: [ 4681.238749] fbc1fff0 f821ffc1 7c7d1b78 7c9c2378 ebc30028 7fdff378 48000018 60000000 [ 4681.238765] 60000000 ebff0008 7c3ef840 41820048 <815f0060> e93f0000 5529077c 7d295378Running that through scripts/decodecode (with some minor hacks .. how do PPC people do this properly?)
We've just always used our own scripts. Mine is here: https://github.com/mpe/misc-scripts/blob/master/ppc/ppc-disasm I've added an issue to our tracker for us to get scripts/decodecode working on our oopses (eventually).
I get: 0: fb c1 ff f0 std r30,-16(r1) 4: f8 21 ff c1 stdu r1,-64(r1) 8: 7c 7d 1b 78 mr r29,r3 c: 7c 9c 23 78 mr r28,r4 10: eb c3 00 28 ld r30,40(r3) 14: 7f df f3 78 mr r31,r30 18: 48 00 00 18 b 0x30 1c: 60 00 00 00 nop 20: 60 00 00 00 nop 24: eb ff 00 08 ld r31,8(r31) 28: 7c 3e f8 40 cmpld r30,r31 2c: 41 82 00 48 beq 0x74 30:* 81 5f 00 60 lwz r10,96(r31) <-- trapping instruction 34: e9 3f 00 00 ld r9,0(r31) 38: 55 29 07 7c rlwinm r9,r9,0,29,30 3c: 7d 29 53 78 or r9,r9,r10 That would seem to track; 96 is 0x60 and r31 contains 0x00..09, giving us an effective address of 0x69. It would be nice to know what source line that corresponds to. Could you use scripts/faddr2line to turn drop_buffers.constprop.0+0x4c/0x1c0 into a line number? I can't because it needs the vmlinux you generated.
You'll need: https://lore.kernel.org/all/20220927075211.897152-1-srikar@linux.vnet.ibm.com/ (local) I don't have the same vmlinux obviously, but mine seems to match up pretty closely, I get: c0000000004e3900 <drop_buffers.constprop.0>: c0000000004e3900: b9 00 4c 3c addis r2,r12,185 c0000000004e3904: 00 c5 42 38 addi r2,r2,-15104 c0000000004e3908: a6 02 08 7c mflr r0 c0000000004e390c: 29 4f b8 4b bl c000000000068834 <_mcount> # ^ entry & ftrace stuff c0000000004e3910: e0 ff 81 fb std r28,-32(r1) c0000000004e3914: e8 ff a1 fb std r29,-24(r1) c0000000004e3918: 78 23 9c 7c mr r28,r4 c0000000004e391c: 78 1b 7d 7c mr r29,r3 c0000000004e3920: f8 ff e1 fb std r31,-8(r1) c0000000004e3924: f0 ff c1 fb std r30,-16(r1) c0000000004e3928: c1 ff 21 f8 stdu r1,-64(r1) # ^ save regs and create stack frame c0000000004e392c: 28 00 c3 eb ld r30,40(r3) # r30 = folio->private (0000000000000009) c0000000004e3930: 78 f3 df 7f mr r31,r30 # r31 = folio->private = head = bh c0000000004e3934: 18 00 00 48 b c0000000004e394c <drop_buffers.constprop.0+0x4c> -> c0000000004e3938: 00 00 00 60 nop c0000000004e393c: 00 00 42 60 ori r2,r2,0 c0000000004e3940: 08 00 ff eb ld r31,8(r31) c0000000004e3944: 40 f8 3e 7c cmpld r30,r31 c0000000004e3948: 48 00 82 41 beq c0000000004e3990 <drop_buffers.constprop.0+0x90> c0000000004e394c: 60 00 5f 81 lwz r10,96(r31) # r10 = bh->b_count $ ./scripts/faddr2line .build/vmlinux drop_buffers.constprop.0+0x4c drop_buffers.constprop.0+0x4c/0x170: arch_atomic_read at arch/powerpc/include/asm/atomic.h:30 (inlined by) atomic_read at include/linux/atomic/atomic-instrumented.h:28 (inlined by) buffer_busy at fs/buffer.c:2859 (inlined by) drop_buffers at fs/buffer.c:2871 static inline int buffer_busy(struct buffer_head *bh) { return atomic_read(&bh->b_count) | (bh->b_state & ((1 << BH_Dirty) | (1 << BH_Lock))); } struct folio { union { struct { long unsigned int flags; /* 0 8 */ union { struct list_head lru; /* 8 16 */ struct { void * __filler; /* 8 8 */ unsigned int mlock_count; /* 16 4 */ }; /* 8 16 */ }; /* 8 16 */ struct address_space * mapping; /* 24 8 */ long unsigned int index; /* 32 8 */ void * private; /* 40 8 */ <---- struct buffer_head { long unsigned int b_state; /* 0 8 */ struct buffer_head * b_this_page; /* 8 8 */ struct page * b_page; /* 16 8 */ sector_t b_blocknr; /* 24 8 */ size_t b_size; /* 32 8 */ char * b_data; /* 40 8 */ struct block_device * b_bdev; /* 48 8 */ bh_end_io_t * b_end_io; /* 56 8 */ void * b_private; /* 64 8 */ struct list_head b_assoc_buffers; /* 72 16 */ struct address_space * b_assoc_map; /* 88 8 */ atomic_t b_count; /* 96 4 */ <---- The buffer_head comes from folio_buffers(folio): static bool drop_buffers(struct folio *folio, struct buffer_head **buffers_to_free) { struct buffer_head *head = folio_buffers(folio); Which is == folio_get_private() r3 and r29 still hold folio = c00c00000042f1c0 That's a valid looking vmemmap address. So we have a valid folio, but its private field == 9 ? Seems like all sorts of things get stuffed into page->private, so presumably 9 is not necessarily a corrupt value, just not what we're expecting. But I'm out of my depth so over to you :) cheers