Re: [PATCH v7.1] block: Coordinate flush requests
From: Tejun Heo <tj@kernel.org>
Date: 2011-01-13 10:42:55
Also in:
lkml
Hello, Darrick. On Wed, Jan 12, 2011 at 06:56:46PM -0800, Darrick J. Wong wrote:
On certain types of storage hardware, flushing the write cache takes a considerable amount of time. Typically, these are simple storage systems with write cache enabled and no battery to save that cache during a power failure. When we encounter a system with many I/O threads that try to flush the cache, performance is suboptimal because each of those threads issues its own flush command to the drive instead of trying to coordinate the flushes, thereby wasting execution time. Instead of each thread initiating its own flush, we now try to detect the situation where multiple threads are issuing flush requests. The first thread to enter blkdev_issue_flush becomes the owner of the flush, and all threads that enter blkdev_issue_flush before the flush finishes are queued up to wait for the next flush. When that first flush finishes, one of those sleeping threads is woken up to perform the next flush and then wake up the other threads which are asleep waiting for the second flush to finish.
Nice work. :-)
quoted hunk ↗ jump to hunk
block/blk-flush.c | 137 +++++++++++++++++++++++++++++++++++++++++-------- block/genhd.c | 12 ++++ include/linux/genhd.h | 15 +++++ 3 files changed, 143 insertions(+), 21 deletions(-)diff --git a/block/blk-flush.c b/block/blk-flush.c index 2402a34..d6c9931 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c@@ -201,6 +201,60 @@ static void bio_end_flush(struct bio *bio, int err) bio_put(bio); } +static int blkdev_issue_flush_now(struct block_device *bdev, + gfp_t gfp_mask, sector_t *error_sector) +{ + DECLARE_COMPLETION_ONSTACK(wait); + struct bio *bio; + int ret = 0; + + bio = bio_alloc(gfp_mask, 0); + bio->bi_end_io = bio_end_flush; + bio->bi_bdev = bdev; + bio->bi_private = &wait; + + bio_get(bio); + submit_bio(WRITE_FLUSH, bio); + wait_for_completion(&wait); + + /* + * The driver must store the error location in ->bi_sector, if + * it supports it. For non-stacked drivers, this should be + * copied from blk_rq_pos(rq). + */ + if (error_sector) + *error_sector = bio->bi_sector; + + if (!bio_flagged(bio, BIO_UPTODATE)) + ret = -EIO; + + bio_put(bio); + return ret; +}
But wouldn't it be better to implement this directly in the flush machinary instead of as blkdev_issue_flush() wrapper? We have all the information at the request queue level so we can easily detect whether flushes can be merged or not and whether something is issued by blkdev_issue_flush() or by directly submitting bio wouldn't matter at all. Am I missing something? Thanks. -- tejun