Re: migrating to space_cache=2 and btrfs userspace commands
From: DanglingPointer <hidden>
Date: 2021-07-14 07:18:40
a) "echo l > /proc/sysrq-trigger" The backup finished today already unfortunately and we are unlikely to run it again until we get an outage to remount the array with the space_cache=v2 and noatime mount options. Thanks for the command, we'll definitely use it if/when it happens again on the next large migration of data. b) "sudo btrfs qgroup show -prce" ........ $ ERROR: can't list qgroups: quotas not enabled So looks like it isn't enabled. File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy backup defaults) What classifies as a transaction? Any/All writes done in a 30sec interval? If 100 unique files were written in 30secs, is that 1 transaction or 100 transactions? Millions of files of the size range above were backed up. c) "Just mount with "space_cache=v2"" Ok so no need to "clear_cache" the v1 cache, right? I wrote this in the fstab but hadn't remounted yet until I can get an outage.... ..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime 0 2" Thanks again for your help Qu! On 14/7/21 2:59 pm, Qu Wenruo wrote:
On 2021/7/13 下午11:38, DanglingPointer wrote:quoted
We're currently considering switching to "space_cache=v2" with noatime mount options for my lab server-workstations running RAID5.Btrfs RAID5 is unsafe due to its write-hole problem.quoted
* One has 13TB of data/metadata in a bunch of 6TB and 2TB disks totalling 26TB. * Another has about 12TB data/metadata in uniformly sized 6TB disks totalling 24TB. * Both of the arrays are on individually luks encrypted disks with btrfs on top of the luks. * Both have "defaults,autodefrag" turned on in fstab. We're starting to see large pauses during constant backups of millions of chunk files (using duplicacy backup) in the 24TB array. Pauses sometimes take up to 20+ seconds in frequencies after every ~30secs of the end of the last pause. "btrfs-transacti" process consistently shows up as the blocking process/thread locking up filesystem IO. IO gets into the RAID5 array via nfsd. There are no disk or btrfs errors recorded. scrub last finished yesterday successfully.Please provide the "echo l > /proc/sysrq-trigger" output when such pause happens. If you're using qgroup (may be enabled by things like snapper), it may be the cause, as qgroup does its accounting when committing transaction. If one transaction is super large, it can cause such problem. You can test if qgroup is enabled by: # btrfs qgroup show -prce <mnt>quoted
After doing some research around the internet, we've come to the consideration above as described. Unfortunately the official documentation isn't clear on the following. Official documentation URL - https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5) 1. How to migrate from default space_cache=v1 to space_cache=v2? It talks about the reverse, from v2 to v1!Just mount with "space_cache=v2".quoted
2. If we use space_cache=v2, is it indeed still the case that the "btrfs" command will NOT work with the filesystem?Why would you think "btrfs" won't work on a btrfs? Thanks, Ququoted
So will our "btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing the btrfs command comes from btrfs-progs which is currently v5.4.1-2 amd64, is that correct? 3. Any other ideas on how we can get rid of those annoying pauses with large backups into the array? Thanks in advance! DP