Re: raid0 confusion question
From: Zygo Blaxell <hidden>
Date: 2021-01-17 01:40:05
On Sat, Jan 16, 2021 at 03:08:00PM -0600, Tim Cuthbertson wrote:
I thought raid0 "striped" the data across two or more devices to increase total capacity, for example when adding a new device to an existing filesystem. But that is not apparently what I ended up with. Before: btrfs device usage /mnt/backup/ /dev/sdc1, ID: 1 Device size: 300.00GiB Device slack: 0.00B Data,single: 226.01GiB Metadata,DUP: 8.00GiB System,DUP: 64.00MiB Unallocated: 65.93GiB /dev/sdc2, ID: 2 Device size: 300.00GiB Device slack: 0.00B Data,single: 1.00GiB Unallocated: 299.00GiB Then, I ran command: btrfs balance start -dconvert=raid0 -mconvert=raid1 /mnt/backup And what I ended up with seems to be double the amount of data used, like what I think would happen with raid1, not raid0: btrfs device usage /mnt/backup/ /dev/sdc1, ID: 1 Device size: 300.00GiB Device slack: 0.00B Data,RAID0: 228.00GiB Metadata,RAID1: 5.00GiB System,RAID1: 64.00MiB Unallocated: 66.94GiB /dev/sdc2, ID: 2 Device size: 300.00GiB Device slack: 0.00B Data,RAID0: 228.00GiB Metadata,RAID1: 5.00GiB System,RAID1: 64.00MiB Unallocated: 66.94GiB Or, am I misinterpreting what I am seeing? Thank you.
btrfs divides disks into 1 GiB slices (on disks of this size), and the joins the slices together to make chunks with a RAID profile. Data and metadata is then stored inside the chunks. btrfs dev usage will show you the size of the chunks, not the amount of (meta)data inside the chunks.
# uname -a
Linux tux 5.10.7-arch1-1 #1 SMP PREEMPT Wed, 13 Jan 2021 12:02:01
+0000 x86_64 GNU/Linux
# btrfs --version
btrfs-progs v5.9
# btrfs fi show
Label: none uuid: c0f4c8e2-b580-4c0d-9562-abdb933b9625
Total devices 1 FS bytes used 13.11GiB
devid 1 size 449.51GiB used 14.01GiB path /dev/sda3
Label: none uuid: 4fe39403-7ba1-4f22-972f-5041e3b6ff6f
Total devices 1 FS bytes used 37.36GiB
devid 1 size 600.00GiB used 40.02GiB path /dev/sdb1
Label: none uuid: 1751eeca-c1a2-47bb-906b-c7199b09eb6d
Total devices 2 FS bytes used 229.57GiB
devid 1 size 300.00GiB used 233.06GiB path /dev/sdc1
devid 2 size 300.00GiB used 233.06GiB path /dev/sdc2btrfs fi show also reports chunk sizes (228 + 5 + 0.06 = 233.06). The difference between the btrfs device size and the amount of data allocated is called "unallocated" in 'btrfs dev usage' and 'btrfs fi usage'. The difference between btrfs device size and the physical device size is called "slack" in 'btrfs dev usage' output (it does not appear in 'fi usage' output).
# btrfs fi df /mnt/backup Data, RAID0: total=456.00GiB, used=226.65GiB System, RAID1: total=64.00MiB, used=64.00KiB Metadata, RAID1: total=5.00GiB, used=2.92GiB GlobalReserve, single: total=401.84MiB, used=0.00B
In 'btrfs fi df' output, 'total' is the size of chunks allocated, 'used' is the amount of space used within the chunks (GlobalReserve is deducted from metadata in RAM, it doesn't physically exist on any disk). If you had shown 'btrfs fi usage' here, it might be clearer. 'fi usage' combines 'dev usage' with 'fi df', and it indicates how much data is stored in each profile separately from how much chunk space is allocated on each disk. Plain 'df' should be showing the same amount of available space, maybe a few GB different due to the metadata balance. It is still storing 226 GiB of data, but it has allocated a larger number of chunks (i.e. one chunk out for each chunk in, but the input chunks are 1GB and the output chunks are 2x1GB, so each chunk is half full). 1GB of data at the beginning of the new chunk and then 1GB of empty space, give or take a few blocks. It's probably harmless, but if you want to waste a lot of iops for nothing, you can balance the data again and it should pack the data into chunks more tightly. I wouldn't bother--the space is allocated for data, so if you add more data to the filesystem it will just fill in those chunks. If you do a data balance, it will repack the data into the data chunks so that allocated space is closer to used space, but then later as you add more data to the filesystem new data chunks will have to be created again. There's no fragmentation concern since the free space areas are likely all contiguous 1 GiB or larger. Same thing has happened with metadata, 5GiB allocated for ~3GiB used. Definitely do NOT balance the metadata (only balance metadata to change RAID profiles) because you'll need that extra 2GiB to be preallocated for metadata as the disk fills up.