Re: About the md-bitmap behavior
From: Qu Wenruo <hidden>
Date: 2022-06-22 02:37:46
Also in:
linux-block
On 2022/6/22 10:15, Doug Ledford wrote:
On Mon, 2022-06-20 at 10:56 +0100, Wols Lists wrote:quoted
On 20/06/2022 08:56, Qu Wenruo wrote:quoted
quoted
The write-hole has been addressed with journaling already, and this will be adding a new and not-needed feature - not saying it wouldn't be nice to have, but do we need another way to skin this cat?I'm talking about the BTRFS RAID56, not md-raid RAID56, which is a completely different thing. Here I'm just trying to understand how the md-bitmap works, so that I can do a proper bitmap for btrfs RAID56.Ah. Okay. Neil Brown is likely to be the best help here as I believe he wrote a lot of the code, although I don't think he's much involved with md- raid any more.I can't speak to how it is today, but I know it was *designed* to be sync flush of the dirty bit setting, then lazy, async write out of the clear bits. But, yes, in order for the design to be reliable, you must flush out the dirty bits before you put writes in flight.
Thank you very much confirming my concern. So maybe it's me not checking the md-bitmap code carefully enough to expose the full picture.
One thing I'm not sure about though, is that MD RAID5/6 uses fixed stripes. I thought btrfs, since it was an allocation filesystem, didn't have to use full stripes? Am I wrong about that?
Unfortunately, we only go allocation for the RAID56 chunks. In side a RAID56 the underlying devices still need to go the regular RAID56 full stripe scheme. Thus the btrfs RAID56 is still the same regular RAID56 inside one btrfs RAID56 chunk, but without bitmap/journal.
Because it would seem that if your data isn't necessarily in full stripes, then a bitmap might not work so well since it just marks a range of full stripes as "possibly dirty, we were writing to them, do a parity resync to make sure".
For the resync part is where btrfs shines, as the extra csum (for the untouched part) and metadata COW ensures us only see the old untouched data, and with the extra csum, we can safely rebuild the full stripe. Thus as long as no device is missing, a write-intent-bitmap is enough to address the write hole in btrfs (at least for COW protected data and all metadata).
In any case, Wols is right, probably want to ping Neil on this. Might need to ping him directly though. Not sure he'll see it just on the list.
Adding Neil into this thread. Any clue on the existing md_bitmap_startwrite() behavior? Thanks, Qu