Thread (15 messages) 15 messages, 4 authors, 2021-11-26

Re: Any bio_clone_slow() implementation which doesn't share bi_io_vec?

From: Johannes Thumshirn <hidden>
Date: 2021-11-23 11:39:16
Also in: dm-devel, linux-fsdevel

On 23/11/2021 12:09, Qu Wenruo wrote:

On 2021/11/23 16:13, Christoph Hellwig wrote:
quoted
On Tue, Nov 23, 2021 at 04:10:35PM +0800, Qu Wenruo wrote:
quoted
Without bio_chain() sounds pretty good, as we can still utilize
bi_end_io and bi_private.

But this also means, we're now responsible not to release the source bio
since it has the real bi_io_vec.
Just call bio_inc_remaining before submitting the cloned bio, and then
call bio_endio on the root bio every time a clone completes.
Yeah, that sounds pretty good for regular usage.

But there is another very tricky case involved.

For btrfs, it supports zoned device, thus we have special calls sites to
switch between bio_add_page() and bio_add_zoned_append_page().

But zoned write can't not be split, nor there is an easy way to directly
convert a regular bio into a bio with zoned append pages.

Currently if we go the slow path, by allocating a new bio, then add
pages from original bio, and advance the original bio, we're able to do
the conversion from regular bio to zoned append bio.

Any idea on this corner case?
I think we have to differentiate two cases here:
A "regular" REQ_OP_ZONE_APPEND bio and a RAID stripe REQ_OP_ZONE_APPEND
bio. The 1st one (i.e. the regular REQ_OP_ZONE_APPEND bio) can't be split
because we cannot guarantee the order the device writes the data to disk. 
For the RAID stripe bio we can split it into the two (or more) parts that
will end up on _different_ devices. All we need to do is a) ensure it 
doesn't cross the device's zone append limit and b) clamp all 
bi_iter.bi_sector down to the start of the target zone, a.k.a sticking to
the rules of REQ_OP_ZONE_APPEND.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help