Re: [PATCH v7.1] block: Coordinate flush requests

From: Tejun Heo <tj@kernel.org>
Date: 2011-01-13 10:42:55
Also in: lkml

Hello, Darrick.

On Wed, Jan 12, 2011 at 06:56:46PM -0800, Darrick J. Wong wrote:

On certain types of storage hardware, flushing the write cache takes a
considerable amount of time.  Typically, these are simple storage systems with
write cache enabled and no battery to save that cache during a power failure.
When we encounter a system with many I/O threads that try to flush the cache,
performance is suboptimal because each of those threads issues its own flush
command to the drive instead of trying to coordinate the flushes, thereby
wasting execution time.

Instead of each thread initiating its own flush, we now try to detect the
situation where multiple threads are issuing flush requests.  The first thread
to enter blkdev_issue_flush becomes the owner of the flush, and all threads
that enter blkdev_issue_flush before the flush finishes are queued up to wait
for the next flush.  When that first flush finishes, one of those sleeping
threads is woken up to perform the next flush and then wake up the other
threads which are asleep waiting for the second flush to finish.

Nice work.  :-)

quoted hunk ↗ jump to hunk

 block/blk-flush.c     |  137 +++++++++++++++++++++++++++++++++++++++++--------
 block/genhd.c         |   12 ++++
 include/linux/genhd.h |   15 +++++
 3 files changed, 143 insertions(+), 21 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index 2402a34..d6c9931 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c

@@ -201,6 +201,60 @@ static void bio_end_flush(struct bio *bio, int err)
 	bio_put(bio);
 }
 
+static int blkdev_issue_flush_now(struct block_device *bdev,
+		gfp_t gfp_mask, sector_t *error_sector)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+	struct bio *bio;
+	int ret = 0;
+
+	bio = bio_alloc(gfp_mask, 0);
+	bio->bi_end_io = bio_end_flush;
+	bio->bi_bdev = bdev;
+	bio->bi_private = &wait;
+
+	bio_get(bio);
+	submit_bio(WRITE_FLUSH, bio);
+	wait_for_completion(&wait);
+
+	/*
+	 * The driver must store the error location in ->bi_sector, if
+	 * it supports it. For non-stacked drivers, this should be
+	 * copied from blk_rq_pos(rq).
+	 */
+	if (error_sector)
+		*error_sector = bio->bi_sector;
+
+	if (!bio_flagged(bio, BIO_UPTODATE))
+		ret = -EIO;
+
+	bio_put(bio);
+	return ret;
+}

But wouldn't it be better to implement this directly in the flush
machinary instead of as blkdev_issue_flush() wrapper?  We have all the
information at the request queue level so we can easily detect whether
flushes can be merged or not and whether something is issued by
blkdev_issue_flush() or by directly submitting bio wouldn't matter at
all.  Am I missing something?

Thanks.

-- 
tejun

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help