Thread (20 messages) 20 messages, 6 authors, 2009-11-24

Which kernel options should be enabled to find the root cause of this bug?

From: Justin Piszcz <hidden>
Date: 2009-11-24 13:08:04
Also in: linux-xfs, lkml


On Sat, 17 Oct 2009, Justin Piszcz wrote:
Hello,

I have a system I recently upgraded from 2.6.30.x and after approximately 
24-48 hours--sometimes longer, the system cannot write any more files to disk 
(luckily though I can still write to /dev/shm) -- to which I have
saved the sysrq-t and sysrq-w output:

http://home.comcast.net/~jpiszcz/20091017/sysrq-w.txt
http://home.comcast.net/~jpiszcz/20091017/sysrq-t.txt

Configuration:

$ cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md1 : 
active raid1 sdb2[1] sda2[0]
     136448 blocks [2/2] [UU]

md2 : active raid1 sdb3[1] sda3[0]
     129596288 blocks [2/2] [UU]

md3 : active raid5 sdj1[7] sdi1[6] sdh1[5] sdf1[3] sdg1[4] sde1[2] sdd1[1] 
sdc1[0]
     5128001536 blocks level 5, 1024k chunk, algorithm 2 [8/8] [UUUUUUUU]

md0 : active raid1 sdb1[1] sda1[0]
     16787776 blocks [2/2] [UU]

$ mount
/dev/md2 on / type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/dev/md1 on /boot type ext3 (rw,noatime)
/dev/md3 on /r/1 type xfs (rw,noatime,nobarrier,logbufs=8,logbsize=262144)
rpc_pipefs on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)

Distribution: Debian Testing
Arch: x86_64

The problem occurs with 2.6.31 and I upgraded to 2.6.31.4 and the problem
persists.

Here is a snippet of two processes in D-state, the first was not doing 
anything, the second was mrtg.

[121444.684000] pickup        D 0000000000000003     0 18407   4521 
0x00000000
[121444.684000]  ffff880231dd2290 0000000000000086 0000000000000000 
0000000000000000
[121444.684000]  000000000000ff40 000000000000c8c8 ffff880176794d10 
ffff880176794f90
[121444.684000]  000000032266dd08 ffff8801407a87f0 ffff8800280878d8 
ffff880176794f90
[121444.684000] Call Trace:
[121444.684000]  [<ffffffff810a742d>] ? free_pages_and_swap_cache+0x9d/0xc0
[121444.684000]  [<ffffffff81454866>] ? __mutex_lock_slowpath+0xd6/0x160
[121444.684000]  [<ffffffff814546ba>] ? mutex_lock+0x1a/0x40
[121444.684000]  [<ffffffff810b26ef>] ? generic_file_llseek+0x2f/0x70
[121444.684000]  [<ffffffff810b119e>] ? sys_lseek+0x7e/0x90
[121444.684000]  [<ffffffff8109ffd2>] ? sys_munmap+0x52/0x80
[121444.684000]  [<ffffffff8102c52b>] ? system_call_fastpath+0x16/0x1b

[121444.684000] rateup        D 0000000000000000     0 18538  18465 
0x00000000
[121444.684000]  ffff88023f8a8c10 0000000000000082 0000000000000000 
ffff88023ea09ec8
[121444.684000]  000000000000ff40 000000000000c8c8 ffff88023faace50 
ffff88023faad0d0
[121444.684000]  0000000300003e00 000000010720cc78 0000000000003e00 
ffff88023faad0d0
[121444.684000] Call Trace:
[121444.684000]  [<ffffffff811f42e2>] ? xfs_buf_iorequest+0x42/0x90
[121444.684000]  [<ffffffff811dd66d>] ? xlog_bdstrat_cb+0x3d/0x50
[121444.684000]  [<ffffffff811db05b>] ? xlog_sync+0x20b/0x4e0
[121444.684000]  [<ffffffff811dc44c>] ? xlog_state_sync+0x26c/0x2a0
[121444.684000]  [<ffffffff810513e0>] ? default_wake_function+0x0/0x10
[121444.684000]  [<ffffffff811dc4d1>] ? _xfs_log_force+0x51/0x80
[121444.684000]  [<ffffffff811dc50b>] ? xfs_log_force+0xb/0x40
[121444.684000]  [<ffffffff811a7223>] ? xfs_alloc_ag_vextent+0x123/0x130
[121444.684000]  [<ffffffff811a7aa8>] ? xfs_alloc_vextent+0x368/0x4b0
[121444.684000]  [<ffffffff811b41e8>] ? xfs_bmap_btalloc+0x598/0xa40
[121444.684000]  [<ffffffff811b6a42>] ? xfs_bmapi+0x9e2/0x11a0
[121444.684000]  [<ffffffff811dd7f0>] ? xlog_grant_push_ail+0x30/0xf0
[121444.684000]  [<ffffffff811e8fd8>] ? xfs_trans_reserve+0xa8/0x220
[121444.684000]  [<ffffffff811d805e>] ? xfs_iomap_write_allocate+0x23e/0x3b0
[121444.684000]  [<ffffffff811f0daf>] ? __xfs_get_blocks+0x8f/0x220
[121444.684000]  [<ffffffff811d8c00>] ? xfs_iomap+0x2c0/0x300
[121444.684000]  [<ffffffff810d5b76>] ? __set_page_dirty+0x66/0xd0
[121444.684000]  [<ffffffff811f0d15>] ? xfs_map_blocks+0x25/0x30
[121444.684000]  [<ffffffff811f1e04>] ? xfs_page_state_convert+0x414/0x6c0
[121444.684000]  [<ffffffff811f23b7>] ? xfs_vm_writepage+0x77/0x130
[121444.684000]  [<ffffffff8108b21a>] ? __writepage+0xa/0x40
[121444.684000]  [<ffffffff8108baff>] ? write_cache_pages+0x1df/0x3c0
[121444.684000]  [<ffffffff8108b210>] ? __writepage+0x0/0x40
[121444.684000]  [<ffffffff810b1533>] ? do_sync_write+0xe3/0x130
[121444.684000]  [<ffffffff8108bd30>] ? do_writepages+0x20/0x40
[121444.684000]  [<ffffffff81085abd>] ? __filemap_fdatawrite_range+0x4d/0x60
[121444.684000]  [<ffffffff811f54dd>] ? xfs_flush_pages+0xad/0xc0
[121444.684000]  [<ffffffff811ee907>] ? xfs_release+0x167/0x1d0
[121444.684000]  [<ffffffff811f52b0>] ? xfs_file_release+0x10/0x20
[121444.684000]  [<ffffffff810b2c0d>] ? __fput+0xcd/0x1e0
[121444.684000]  [<ffffffff810af556>] ? filp_close+0x56/0x90
[121444.684000]  [<ffffffff810af636>] ? sys_close+0xa6/0x100
[121444.684000]  [<ffffffff8102c52b>] ? system_call_fastpath+0x16/0x1b

Anyone know what is going on here?

Justin.
In addition to using netconsole, which kernel options should be enabled
to better diagnose this issue?

Should I enable these to help track down this bug?

[ ]   XFS Debugging support (EXPERIMENTAL)
[ ] Compile the kernel with frame pointers

Are there any other options that will help determine the root cause of this
bug that are recommended?

Justin.

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help