Re: migrating to space_cache=2 and btrfs userspace commands
From: Joshua <hidden>
Date: 2021-07-15 18:00:05
Just as a point of data, I have a 96 TB array with RAID1 data, and RAID1C3 metadata. I made the switch to space_cache=v2 some time ago, and I remember it made a huge difference when I did so! (It was RAID1 metadata at the time, as RAID1C3 was not available at the time.) However, I also tried a check with '--clear-space-cache v1' at the time, and after waiting a literal whole day without it completing, I gave up, canceled it, and put it back into production. Is a --clear-space-cache v1 operation expected to take so long on such a large file system? Thanks! --Joshua Villwock July 15, 2021 9:40 AM, "DanglingPointer" [off-list ref] wrote:
Hi Qu, Just updating here that setting the mount option "space_cache=v2" and "noatime" completely SOLVED the performance problem! Basically like night and day! These are my full fstab mount options... btrfs defaults,autodefrag,space_cache=v2,noatime 0 2 Perhaps defaulting the space_cache=v2 should be considered? Why default to v1, what's the value of v1? So for conclusion, for large multi-terrabyte arrays (in my case RAID5s), setting space_cache=v2 and noatime massively increases performance and eliminates the large long pauses in frequent intervals by "btrfs-transacti" blocking all IO. Thanks Qu for your help! On 14/7/21 5:45 pm, Qu Wenruo wrote:quoted
On 2021/7/14 下午3:18, DanglingPointer wrote:quoted
a) "echo l > /proc/sysrq-trigger" The backup finished today already unfortunately and we are unlikely to run it again until we get an outage to remount the array with the space_cache=v2 and noatime mount options. Thanks for the command, we'll definitely use it if/when it happens again on the next large migration of data.Just to avoid confusion, after that command, "dmesg" output is still needed, as that's where sysrq put its output.quoted
b) "sudo btrfs qgroup show -prce" ........ $ ERROR: can't list qgroups: quotas not enabled So looks like it isn't enabled.One less thing to bother.quoted
File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy backup defaults)Between 1~16MiB, thus tons of small files. Btrfs is not really good at handling tons of small files, as they generate a lot of metadata. That may contribute to the hang.quoted
What classifies as a transaction?It's a little complex. Technically it's a check point where before the checkpoint, all you see is old data, after the checkpoint, all you see is new data. To end users, any data and metadata write will be included into one transaction (with proper dependency handled). One way to finish (or commit) current transaction is to sync the fs, using "sync" command (sync all filesystems).quoted
Any/All writes done in a 30sec interval?This the default commit interval. Almost all fses will try to commit its data/metadata to disk after a configurable interval. The default one is 30s. That's also one way to commit current > transaction.quoted
If 100 unique files were written in 30secs, is that 1 transaction or 100 transactions?It depends. As things like syncfs() and subvolume/snapshot creation may try to commit transaction. But without those special operations, just writing 100 unique files using buffered write, it would only start one transaction, and when the 30s interval get hit, the transaction will be committed to disk.quoted
Millions of files of the size range above were backed up.The amount of files may not force a transaction commit, if it doesn't trigger enough memory pressure, or free space pressure. Anyway, the "echo l" sysrq would help us to locate what's taking so long time.quoted
c) "Just mount with "space_cache=v2"" Ok so no need to "clear_cache" the v1 cache, right?Yes, and "clear_cache" won't really remove all the v1 cache anyway. Thus it doesn't help much. The only way to fully clear v1 cache is by using "btrfs check --clear-space-cache v1" on a *unmounted* btrfs.quoted
I wrote this in the fstab but hadn't remounted yet until I can get an outage....IMHO if you really want to test if v2 would help, you can just remount, no need to wait for a break. Thanks, Ququoted
..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime 0 2 > Thanks again for your help Qu! On 14/7/21 2:59 pm, Qu Wenruo wrote:On 2021/7/13 下午11:38, DanglingPointer wrote: We're currently considering switching to "space_cache=v2" with noatime mount options for my lab server-workstations running RAID5. Btrfs RAID5 is unsafe due to its write-hole problem. * One has 13TB of data/metadata in a bunch of 6TB and 2TB disks totalling 26TB. * Another has about 12TB data/metadata in uniformly sized 6TB disks totalling 24TB. * Both of the arrays are on individually luks encrypted disks with btrfs on top of the luks. * Both have "defaults,autodefrag" turned on in fstab. We're starting to see large pauses during constant backups of millions of chunk files (using duplicacy backup) in the 24TB array. Pauses sometimes take up to 20+ seconds in frequencies after every ~30secs of the end of the last pause. "btrfs-transacti" process consistently shows up as the blocking process/thread locking up filesystem IO. IO gets into the RAID5 array via nfsd. There are no >>>> disk or btrfs errors recorded. scrub last finished yesterday successfully. Please provide the "echo l > /proc/sysrq-trigger" output when such >>> pause happens. If you're using qgroup (may be enabled by things like snapper), it may be the cause, as qgroup does its accounting when committing >>> transaction. If one transaction is super large, it can cause such problem. You can test if qgroup is enabled by: # btrfs qgroup show -prce <mnt> After doing some research around the internet, we've come to the consideration above as described. Unfortunately the official documentation isn't clear on the following. Official documentation URL - https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5) 1. How to migrate from default space_cache=v1 to space_cache=v2? It talks about the reverse, from v2 to v1! Just mount with "space_cache=v2". 2. If we use space_cache=v2, is it indeed still the case that the "btrfs" command will NOT work with the filesystem? Why would you think "btrfs" won't work on a btrfs? Thanks, Qu So will our "btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing the btrfs command comes from btrfs-progs which is currently >>>> v5.4.1-2 amd64, is that correct? 3. Any other ideas on how we can get rid of those annoying pauses with large backups into the array? Thanks in advance! DP