Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec?
From: Qu Wenruo <hidden>
Date: 2021-11-24 07:39:49
Also in:
dm-devel, linux-fsdevel
On 2021/11/24 15:25, Naohiro Aota wrote:
On Wed, Nov 24, 2021 at 07:07:18AM +0800, Qu Wenruo wrote:quoted
On 2021/11/23 22:28, hch@infradead.org wrote:quoted
On Tue, Nov 23, 2021 at 11:39:11AM +0000, Johannes Thumshirn wrote:quoted
I think we have to differentiate two cases here: A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split because we cannot guarantee the order the device writes the data to disk.That's correct. But if we want to move all bio split into chunk layer, we want a initial bio without any limitation, and then using that bio to create real REQ_OP_ZONE_APPEND bios with proper size limitations.quoted
quoted
For the RAID stripe bio we can split it into the two (or more) parts that will end up on _different_ devices. All we need to do is a) ensure it doesn't cross the device's zone append limit and b) clamp all bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to the rules of REQ_OP_ZONE_APPEND.Exactly. A stacking driver must never split a REQ_OP_ZONE_APPEND bio. But the file system itself can of course split it as long as each split off bio has it's own bi_end_io handler to record where it has been written to.This makes me wonder, can we really forget the zone thing for the initial bio so we just create a plain bio without any special limitation, and let every split condition be handled in the lower layer? Including raid stripe boundary, zone limitations etc.What really matters is to ensure the "one bio (for real zoned device) == one ordered extent" rule. When a device rewrites ZONE_APPEND bio's sector address, we rewrite the ordered extent's logical address accordingly in the end_io process. For ensuring the rewriting works, one extent must be composed with one contiguous bio. So, if we can split an ordered extent at the bio splitting process, that will be fine. Or, it is also fine if we can split an ordered extent at end_bio process. But, I think it is difficult because someone can be already waiting for the ordered extent, and splitting it at that point will break some assumptions in the code.
OK, I see the problem now. It's extract_ordered_extent() relying on the zoned append bio to split the ordered extents. Not the opposite, thus it will be still more complex than I thought to split bio in chunk layer. I'll leave the zoned part untouched for now until I have a better solution. Thanks, Qu
quoted
(yeah, it's still not pure stacking driver, but it's more stacking-driver like). In that case, the missing piece seems to be a way to convert a splitted plain bio into a REQ_OP_ZONE_APPEND bio. Can this be done without slow bvec copying? Thanks, Qu