Re: [PATCH 03/27] xfs: use write_cache_pages for writeback clustering
From: Dave Chinner <david@fromorbit.com>
Date: 2011-06-30 02:48:51
On Thu, Jun 30, 2011 at 12:00:13PM +1000, Dave Chinner wrote:
On Wed, Jun 29, 2011 at 10:01:12AM -0400, Christoph Hellwig wrote:quoted
Instead of implementing our own writeback clustering use write_cache_pages to do it for us. This means the guts of the current writepage implementation become a new helper used both for implementing ->writepage and as a callback to write_cache_pages for ->writepages. A new struct xfs_writeback_ctx is used to track block mapping state and the ioend chain over multiple invocation of it. The advantage over the old code is that we avoid a double pagevec lookup, and a more efficient handling of extent boundaries inside a page for small blocksize filesystems, as well as having less XFS specific code.Yes, it should be, but I can't actually measure any noticable CPU usage difference @800MB/s writeback. The profiles change shape around the changed code, but overall cpu usage does not change. I think this is because the second pagevec lookup is pretty much free because the radix tree is already hot in cache when we do the second lookup...quoted
The downside is that we don't do writeback clustering when called from kswapd anyore, but that is a case that should be avoided anyway. Note that we still convert the whole delalloc range from ->writepage, so the on-disk allocation pattern is not affected.All the more reason to ensure the mm subsystem doesn't do this.... .....quoted
error: - if (iohead) - xfs_cancel_ioend(iohead); - - if (err == -EAGAIN) - goto redirty; -Should this EAGAIN handling be dealt with in the removing-the-non- blocking-mode patch?quoted
+STATIC int xfs_vm_writepages( struct address_space *mapping, struct writeback_control *wbc) { + struct xfs_writeback_ctx ctx = { }; + int ret; + xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); - return generic_writepages(mapping, wbc); + + ret = write_cache_pages(mapping, wbc, __xfs_vm_writepage, &ctx); + + if (ctx.iohead) { + if (ret) + xfs_cancel_ioend(ctx.iohead); + else + xfs_submit_ioend(wbc, ctx.iohead); + }I think this error handling does not work. If we have put pages into the ioend (i.e. successful ->writepage calls) and then have a ->writepage call fail, we'll get all the pages under writeback (i.e. those on the ioend) remain in that state, and not ever get written back (so move into the clean state) or redirtied (so written again later) xfs_cancel_ioend() was only ever called for the first page sent down to ->writepage, and on error that page was redirtied separately. Hence it doesn't handle this case at all as it never occurs in the existing code. I'd suggest that regardless of whether an error is returned here, the existence of ctx.iohead indicates a valid ioend that needs to be submitted....
I think i just tripped this. I'm running a 1k block size filesystem, and test 224 has hung with waiting on IO completion after .writepage errors: [ 2850.300979] XFS (vdb): Mounting Filesystem [ 2850.310069] XFS (vdb): Ending clean mount [ 2867.246341] Filesystem "vdb": reserve blocks depleted! Consider increasing reserve pool size. [ 2867.247652] XFS (vdb): page discard on page ffffea0000257b40, inode 0x1c6, offset 1187840. [ 2867.254135] XFS (vdb): page discard on page ffffea0000025f40, inode 0x423, offset 1839104. [ 2867.256289] XFS (vdb): page discard on page ffffea0000a21aa0, inode 0x34e, offset 28672. [ 2867.258845] XFS (vdb): page discard on page ffffea00001830d0, inode 0xe5, offset 3637248. [ 2867.260637] XFS (vdb): page discard on page ffffea0000776af8, inode 0x132, offset 6283264. [ 2867.269380] XFS (vdb): page discard on page ffffea00009d5d38, inode 0xf1, offset 5632000. [ 2867.277851] XFS (vdb): page discard on page ffffea0000017e60, inode 0x27a, offset 32768. [ 2867.281165] XFS (vdb): page discard on page ffffea0000258278, inode 0x274, offset 32768. [ 2867.282802] XFS (vdb): page discard on page ffffea00009a3c60, inode 0x48a, offset 32768. [ 2867.284166] XFS (vdb): page discard on page ffffea0000cc7808, inode 0x42e, offset 32768. [ 2867.287138] XFS (vdb): page discard on page ffffea00004d4440, inode 0x4e0, offset 32768. [ 2867.288500] XFS (vdb): page discard on page ffffea0000b34978, inode 0x4cd, offset 32768. [ 2867.289381] XFS (vdb): page discard on page ffffea00003f40f8, inode 0x4c4, offset 155648. [ 2867.291536] XFS (vdb): page discard on page ffffea0000023578, inode 0x4c7, offset 32768. [ 2867.300880] XFS (vdb): page discard on page ffffea00005276e8, inode 0x4cc, offset 32768. [ 2867.318819] XFS (vdb): page discard on page ffffea0000777230, inode 0x449, offset 8581120. [ 4701.141666] SysRq : Show Blocked State [ 4701.142093] task PC stack pid father [ 4701.142707] dd D ffff8800076edbe8 0 14211 8946 0x00000000 [ 4701.143509] ffff88002b03fa58 0000000000000086 ffffea00002db598 ffffea0000000000 [ 4701.144009] ffff88002b03f9d8 ffffffff81113a35 ffff8800076ed860 0000000000010f80 [ 4701.144009] ffff88002b03ffd8 ffff88002b03e010 ffff88002b03ffd8 0000000000010f80 [ 4701.144009] Call Trace: [ 4701.144009] [<ffffffff81113a35>] ? __free_pages+0x35/0x40 [ 4701.144009] [<ffffffff81062f69>] ? default_spin_lock_flags+0x9/0x10 [ 4701.144009] [<ffffffff8110b520>] ? __lock_page+0x70/0x70 [ 4701.144009] [<ffffffff81afe2d0>] io_schedule+0x60/0x80 [ 4701.144009] [<ffffffff8110b52e>] sleep_on_page+0xe/0x20 [ 4701.144009] [<ffffffff81afec2f>] __wait_on_bit+0x5f/0x90 [ 4701.144009] [<ffffffff8110b773>] wait_on_page_bit+0x73/0x80 [ 4701.144009] [<ffffffff810a4110>] ? autoremove_wake_function+0x40/0x40 [ 4701.144009] [<ffffffff81116365>] ? pagevec_lookup_tag+0x25/0x40 [ 4701.144009] [<ffffffff8110bbc2>] filemap_fdatawait_range+0x112/0x1a0 [ 4701.144009] [<ffffffff8145f469>] xfs_wait_on_pages+0x59/0x80 [ 4701.144009] [<ffffffff8145f51d>] xfs_flush_pages+0x8d/0xb0 [ 4701.144009] [<ffffffff8145f084>] xfs_file_buffered_aio_write+0x104/0x190 [ 4701.144009] [<ffffffff81b03a98>] ? do_page_fault+0x1e8/0x450 [ 4701.144009] [<ffffffff8145f2cf>] xfs_file_aio_write+0x1bf/0x300 [ 4701.144009] [<ffffffff81160844>] ? path_openat+0x104/0x3f0 [ 4701.144009] [<ffffffff8115251a>] do_sync_write+0xda/0x120 [ 4701.144009] [<ffffffff816488b3>] ? security_file_permission+0x23/0x90 [ 4701.144009] [<ffffffff81152a88>] vfs_write+0xc8/0x180 [ 4701.144009] [<ffffffff81152c31>] sys_write+0x51/0x90 [ 4701.144009] [<ffffffff81b07ec2>] system_call_fastpath+0x16/0x1b Cheers, Dave. -- Dave Chinner david@fromorbit.com _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs