Re: [PATCH] ext4: Rework the ext4_da_writepages
From: Aneesh Kumar K.V <hidden>
Date: 2008-08-01 05:07:45
Subsystem:
ext4 file system, filesystems (vfs and infrastructure), the rest · Maintainers:
"Theodore Ts'o", Alexander Viro, Christian Brauner, Linus Torvalds
On Fri, Aug 01, 2008 at 10:24:12AM +0530, Aneesh Kumar K.V wrote:
On Thu, Jul 31, 2008 at 02:10:55PM -0600, Andreas Dilger wrote:quoted
On Jul 31, 2008 23:03 +0530, Aneesh Kumar wrote:quoted
With the below changes we reserve credit needed to insert only one extent resulting from a call to single get_block. That make sure we don't take too much journal credits during writeout. We also don't limit the pages to write. That means we loop through the dirty pages building largest possible contiguous block request. Then we issue a single get_block request. We may get less block that we requested. If so we would end up not mapping some of the buffer_heads. That means those buffer_heads are still marked delay. Later in the writepage callback via __mpage_writepage we redirty those pages.Can you please clarify this? Does this mean we take one pass through the dirty pages, but possibly do not allocate some subset of the pages. Then, at some later time these holes are written out separately? This seems like it would produce fragmentation if we do not work to ensure the pages are allocated in sequence. Maybe I'm misunderstanding your comment and the unmapped pages are immediately mapped on the next loop?We take multiple pass through the dirty pages until wbc->nr_to_write is <= 0 or we don't have anything more to write. But if get_block doesn't return the requested number of blocks we may possibly not writeout some of the pages. Whether this can result in a disk layout worse than the current, I am not sure. I haven't looked at the layout yet. But these pages which are skipped are redirtied again via reditry_pages_for_writepage and will be forced for writeout. Well we can do better by setting wbc->encountered_congestion = 1; even though we are not really congested. That would cause most of the pdflush work func to retry writeback_indoes. for(;;) { ... wbc.pages_skipped = 0; writeback_inodes(&wbc); ... if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) { /* Wrote less than expected */ if (wbc.encountered_congestion || wbc.more_io) congestion_wait(WRITE, HZ/10); else break; } }
like below ?
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 53a8fc7..6fd527c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c@@ -1773,6 +1773,14 @@ static void mpage_da_map_blocks(struct mpage_da_data *mpd) return; BUG_ON(new.b_size == 0); + if (new.b_size < lbh->b_size) { + /* + * allocated less blocks. force writepages + * to be called again + */ + mpd->wbc->more_io = 1; + } + if (buffer_new(&new)) __unmap_underlying_blocks(mpd->inode, &new);
@@ -1876,6 +1884,8 @@ static int __mpage_da_writepage(struct page *page, * skip rest of the page in the page_vec */ mpd->io_done = 1; + /* We want writepages to be called again */ + wbc->more_io = 1; redirty_page_for_writepage(wbc, page); unlock_page(page); return MPAGE_DA_EXTENT_TAIL;