Re: xfs hang when filesystem filled
From: Dave Chinner <david@fromorbit.com>
Date: 2012-08-08 22:23:16
On Tue, Aug 07, 2012 at 02:54:48PM +0900, Guk-Bong, Kwon wrote:
HI all I tested xfs over nfs using bonnie++ xfs and nfs hang when xfs filesystem filled What's the problem?
It appears to be blocked in writeback, getting ENOSPC errors when they shouldn't occur.
see below
--------------------------------
1. nfs server
a. uname -a
- Linux nfs_server 2.6.32.58 #1 SMP Thu Mar 22 13:33:34 KST 2012 x86_64
Intel(R) Xeon(R) CPU E5606 @ 2.13GHz GenuineIntel GNU/LinuxOld kernel. Upgrade.
================================================================================ /test 0.0.0.0/0.0.0.0(rw,async,wdelay,hide,nocrossmnt,insecure,no_root_squash,no_all_squash,no_subtree_check,secure_locks,acl,fsid=1342087477,anonuid=65534,anongid=65534) ================================================================================
You're using the async export option, which means the server/client write throttling mechanisms built into the NFs protocol are not active. That leads to clients swamping the server with dirty data and not backing off when the server is overloaded, and leads to -data loss- when the server fails. IOWs, you're massively overcomitting allocation from lots of threads which means you are probably depleting the free space pool, and that leads to -data loss- and potentially deadlocks. If this is what your production systems do, then a) increase the reserve pool, and b) fix your producton systems not to do this.
Aug 2 18:17:58 anystor1 kernel: Call Trace: Aug 2 18:17:58 anystor1 kernel: [<ffffffff811738ce>] ? xfs_btree_is_lastrec+0x4e/0x60 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8135edad>] ? schedule_timeout+0x1ed/0x250 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8135fcd1>] ? __down+0x61/0xa0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff810572d6>] ? down+0x46/0x50 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811af6a4>] ? _xfs_buf_find+0x134/0x220 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811af7fe>] ? xfs_buf_get_flags+0x6e/0x190 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811a525e>] ? xfs_trans_get_buf+0x10e/0x160 Aug 2 18:17:58 anystor1 kernel: [<ffffffff81161954>] ? xfs_alloc_fix_freelist+0x144/0x450 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8119e597>] ? xfs_icsb_disable_counter+0x17/0x160 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8116d2f2>] ? xfs_bmap_add_extent_delay_real+0x8d2/0x11a0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811a4b83>] ? xfs_trans_log_buf+0x63/0xa0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8119e731>] ? xfs_icsb_balance_counter_locked+0x31/0xf0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff81161ed1>] ? xfs_alloc_vextent+0x1b1/0x4c0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8116e946>] ? xfs_bmap_btalloc+0x596/0xa70 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8117125a>] ? xfs_bmapi+0x9fa/0x1230 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811965f6>] ? xlog_state_release_iclog+0x56/0xe0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811a3a0f>] ? xfs_trans_reserve+0x9f/0x210 Aug 2 18:17:58 anystor1 kernel: [<ffffffff81192d0e>] ? xfs_iomap_write_allocate+0x24e/0x3d0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811c29c0>] ? elv_insert+0xf0/0x260 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8119396b>] ? xfs_iomap+0x2cb/0x300 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811aba05>] ? xfs_map_blocks+0x25/0x30 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811acb64>] ? xfs_page_state_convert+0x414/0x6d0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811ad137>] ? xfs_vm_writepage+0x77/0x130 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8107c8ca>] ? __writepage+0xa/0x40 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8107d0af>] ? write_cache_pages+0x1df/0x3d0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8107c8c0>] ? __writepage+0x0/0x40 Aug 2 18:17:58 anystor1 kernel: [<ffffffff81076f4c>] ? __filemap_fdatawrite_range+0x4c/0x60 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811da3a1>] ? radix_tree_gang_lookup+0x71/0xf0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b029d>] ? xfs_flush_pages+0xad/0xc0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b795a>] ? xfs_sync_inode_data+0xca/0xf0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7aa0>] ? xfs_inode_ag_walk+0x80/0x140 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7890>] ? xfs_sync_inode_data+0x0/0xf0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7be8>] ? xfs_inode_ag_iterator+0x88/0xd0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7890>] ? xfs_sync_inode_data+0x0/0xf0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff8135ed1d>] ? schedule_timeout+0x15d/0x250 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7f40>] ? xfs_sync_data+0x30/0x60 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7f8e>] ? xfs_flush_inodes_work+0x1e/0x50 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b726c>] ? xfssyncd+0x13c/0x1d0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff811b7130>] ? xfssyncd+0x0/0x1d0 Aug 2 18:17:58 anystor1 kernel: [<ffffffff810529d6>] ? kthread+0x96/0xb0
There's your problem - writeback of data is blocked waiting on a metadata buffer, and everything else is blocked behind it. Upgrade your kernel. In summary, you are doing something silly on a very old kernel and you broke it. As a prize, you get to keep all the broken pieces..... Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs