Re: [PATCH] ext4: Rework the ext4_da_writepages

From: Aneesh Kumar K.V <hidden>
Date: 2008-08-01 05:07:45
Subsystem: ext4 file system, filesystems (vfs and infrastructure), the rest · Maintainers: "Theodore Ts'o", Alexander Viro, Christian Brauner, Linus Torvalds

On Fri, Aug 01, 2008 at 10:24:12AM +0530, Aneesh Kumar K.V wrote:

On Thu, Jul 31, 2008 at 02:10:55PM -0600, Andreas Dilger wrote:

quoted

On Jul 31, 2008  23:03 +0530, Aneesh Kumar wrote:

quoted

With the below changes we reserve credit needed to insert only one extent
resulting from a call to single get_block. That make sure we don't take
too much journal credits during writeout. We also don't limit the pages
to write. That means we loop through the dirty pages building largest
possible contiguous block request. Then we issue a single get_block request.
We may get less block that we requested. If so we would end up not mapping
some of the buffer_heads. That means those buffer_heads are still marked delay.
Later in the writepage callback via __mpage_writepage we redirty those pages.

Can you please clarify this?  Does this mean we take one pass through the
dirty pages, but possibly do not allocate some subset of the pages.  Then,
at some later time these holes are written out separately?  This seems
like it would produce fragmentation if we do not work to ensure the pages
are allocated in sequence.  Maybe I'm misunderstanding your comment and
the unmapped pages are immediately mapped on the next loop?

We take multiple pass through the dirty pages until wbc->nr_to_write is
<= 0 or we don't have anything more to write. But if get_block doesn't
return the requested number of blocks we may possibly not writeout
some of the pages. Whether this can result in a disk layout worse than
the current, I am not sure. I haven't looked at the layout yet.
But these pages which are skipped are redirtied again via
reditry_pages_for_writepage and will be forced for writeout. Well
we can do better by setting  wbc->encountered_congestion = 1; even
though we are not really congested. That would cause most of the pdflush
work func to retry writeback_indoes.

for(;;) {
...
wbc.pages_skipped = 0;
writeback_inodes(&wbc);
...

if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
	/* Wrote less than expected */
	if (wbc.encountered_congestion || wbc.more_io)
		congestion_wait(WRITE, HZ/10);
	else
		break;
}

}

like below ?

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 53a8fc7..6fd527c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c

@@ -1773,6 +1773,14 @@ static void mpage_da_map_blocks(struct mpage_da_data *mpd)
 		return;
 	BUG_ON(new.b_size == 0);
 
+	if (new.b_size < lbh->b_size) {
+		/*
+		 * allocated less blocks. force writepages
+		 * to be called again
+		 */
+		mpd->wbc->more_io = 1;
+	}
+
 	if (buffer_new(&new))
 		__unmap_underlying_blocks(mpd->inode, &new);

@@ -1876,6 +1884,8 @@ static int __mpage_da_writepage(struct page *page,
 			 * skip rest of the page in the page_vec
 			 */
 			mpd->io_done = 1;
+			/* We want writepages to be called again */
+			wbc->more_io = 1;
 			redirty_page_for_writepage(wbc, page);
 			unlock_page(page);
 			return MPAGE_DA_EXTENT_TAIL;

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help