Re: [PATCH 5/5] dm: implement REQ_FLUSH/FUA support
From: Mike Snitzer <hidden>
Date: 2010-08-17 14:07:35
Also in:
dm-devel, linux-fsdevel, linux-raid, linux-scsi, lkml
On Tue, Aug 17 2010 at 5:33am -0400, Tejun Heo [off-list ref] wrote:
Hello, On 08/16/2010 09:02 PM, Mike Snitzer wrote:quoted
On Mon, Aug 16 2010 at 12:52pm -0400, Tejun Heo [off-list ref] wrote:quoted
From: Tejun Heo <redacted> This patch converts dm to support REQ_FLUSH/FUA instead of now deprecated REQ_HARDBARRIER.What tree does this patch apply to? I know it doesn't apply to v2.6.36-rc1, e.g.: http://git.kernel.org/linus/708e929513502fb0(from the head message) These patches are on top of block#for-2.6.36-post (c047ab2dddeeafbd6f7c00e45a13a5c4da53ea0b) + block-replace-barrier-with-sequenced-flush patchset[1] + block-fix-incorrect-bio-request-flag-conversion-in-md patch[2] and available in the following git tree. git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua [1] http://thread.gmane.org/gmane.linux.kernel/1022363 [2] http://thread.gmane.org/gmane.linux.kernel/1023435 Probably fetching the git tree is the easist way to review?
OK, I missed this info because I just looked at the DM patch.
quoted
quoted
For bio-based dm, * -EOPNOTSUPP retry logic dropped.That logic wasn't just about retries (at least not in the latest kernel). With commit 708e929513502fb0 the -EOPNOTSUPP checking also serves to optimize the barrier+discard case (when discards aren't supported).With the patch applied, there's no second flush. Those requests would now be REQ_FLUSH + REQ_DISCARD. The first can't be avoided anyway and there won't be the second flush to begin with, so I don't think this worsens anything.
Makes sense, but your patches still need to be refreshed against the latest (2.6.36-rc1) upstream code. Numerous changes went in to DM recently.
quoted
quoted
* Nothing much changes. It just needs to handle FLUSH requests as before. It would be beneficial to advertise FUA capability so that it can propagate FUA flags down to member request_queues instead of sequencing it as WRITE + FLUSH at the top queue.Can you expand on that TODO a bit? What is the mechanism to propagate FUA down to a DM device's members? I'm only aware of propagating member devices' features up to the top-level DM device's request-queue (not the opposite). Are you saying that establishing the FUA capability on the top-level DM device's request_queue is sufficient? If so then why not make the change?Yeah, I think it would be enough to always advertise FLUSH|FUA if the member devices support FLUSH (regardless of FUA support). The reason why I didn't do it was, umm, laziness, I suppose.
I don't buy it.. you're far from lazy! ;)
quoted
quoted
Lightly tested linear, stripe, raid1, snap and crypt targets. Please proceed with caution as I'm not familiar with the code base.This is concerning...Yeap, I want you to be concerned. :-) This was the first time I looked at the dm code and there are many different disjoint code paths and I couldn't fully follow or test all of them, so it definitely needs a careful review from someone who understands the whole thing.
You'll need Mikulas (bio-based) and NEC (request-based, Kiyoshi and Jun'ichi) to give it serious review. NOTE: NEC has already given some preliminary feedback to hch in the "[PATCH, RFC 2/2] dm: support REQ_FLUSH directly" thread: https://www.redhat.com/archives/dm-devel/2010-August/msg00026.html https://www.redhat.com/archives/dm-devel/2010-August/msg00033.html
quoted
if we're to offer more comprehensive review I think we need more detail on what guided your changes rather than details of what the resulting changes are.I'll try to explain it. If you have any further questions, please let me know.
Thanks for the additional details.
* For bio based dm:
* Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't have any ordering
requirements. Remove assumptions of ordering and/or draining.
A related question: Is dm_wait_for_completion() used in
process_flush() safe against starvation under continuous influx of
other commands?OK, so you folded dm_flush() directly into process_flush() -- the code that was dm_flush() only needs to be called once now. As for your specific dm_wait_for_completion() concern -- I'll defer to Mikulas. But I'll add: we haven't had any reported starvation issues with DM's existing barrier support. DM uses a mempool for its clones, so it should naturally throttle (without starvation) when memory gets low.
* As REQ_FLUSH/FUA doesn't require any ordering of requests before
or after it, on array devices, the latter part - REQ_FUA - can be
handled like other writes. ie. REQ_FLUSH needs to be broadcasted
to all devices but once that is complete the data/REQ_FUA bio can
be sent to only the affected devices. This needs some care as
there are bio cloning/splitting code paths where REQ_FUA bit isn't
preserved.
* Guarantee that REQ_FLUSH w/ data never reaches targets (this in
part is to put it in alignment with request based dm).bio-based DM already split the barrier out from the data (in process_barrier). You've renamed process_barrier to process_flush and added the REQ_FLUSH logic like I'd expect.
* For request based dm:
* The sequencing is done by the block layer for the top level
request_queue, so the only things request based dm needs to make
sure is 1. handling empty REQ_FLUSH correctly (block layer will
only send down empty REQ_FLUSHes) and 2. propagate REQ_FUA bit to
member devices.OK, so seems 1 is done, 2 is still TODO. Looking at your tree it seems 2 would be as simple as using the following in dm_init_request_based_queue (on the most current upstream dm.c): blk_queue_flush(q, REQ_FLUSH | REQ_FUA); (your current patch only sets REQ_FLUSH in alloc_dev).