Re: migrating to space_cache=2 and btrfs userspace commands

From: DanglingPointer <hidden>
Date: 2021-07-14 07:18:40

a) "echo l > /proc/sysrq-trigger"

The backup finished today already unfortunately and we are unlikely to 
run it again until we get an outage to remount the array with the 
space_cache=v2 and noatime mount options.
Thanks for the command, we'll definitely use it if/when it happens again 
on the next large migration of data.


b) "sudo btrfs qgroup show -prce" ........

$ ERROR: can't list qgroups: quotas not enabled

So looks like it isn't enabled.

File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy 
backup defaults)

What classifies as a transaction?  Any/All writes done in a 30sec 
interval?  If 100 unique files were written in 30secs, is that 1 
transaction or 100 transactions?  Millions of files of the size range 
above were backed up.


c) "Just mount with "space_cache=v2""

Ok so no need to "clear_cache" the v1 cache, right?
I wrote this in the fstab but hadn't remounted yet until I can get an 
outage....

..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime  0  2"

Thanks again for your help Qu!

On 14/7/21 2:59 pm, Qu Wenruo wrote:


On 2021/7/13 下午11:38, DanglingPointer wrote:

quoted

We're currently considering switching to "space_cache=v2" with noatime
mount options for my lab server-workstations running RAID5.

Btrfs RAID5 is unsafe due to its write-hole problem.

quoted

  * One has 13TB of data/metadata in a bunch of 6TB and 2TB disks
    totalling 26TB.
  * Another has about 12TB data/metadata in uniformly sized 6TB disks
    totalling 24TB.
  * Both of the arrays are on individually luks encrypted disks with
    btrfs on top of the luks.
  * Both have "defaults,autodefrag" turned on in fstab.

We're starting to see large pauses during constant backups of millions
of chunk files (using duplicacy backup) in the 24TB array.

Pauses sometimes take up to 20+ seconds in frequencies after every
~30secs of the end of the last pause.  "btrfs-transacti" process
consistently shows up as the blocking process/thread locking up
filesystem IO.  IO gets into the RAID5 array via nfsd. There are no disk
or btrfs errors recorded.  scrub last finished yesterday successfully.

Please provide the "echo l > /proc/sysrq-trigger" output when such pause
happens.

If you're using qgroup (may be enabled by things like snapper), it may
be the cause, as qgroup does its accounting when committing transaction.

If one transaction is super large, it can cause such problem.

You can test if qgroup is enabled by:

# btrfs qgroup show -prce <mnt>

quoted

After doing some research around the internet, we've come to the
consideration above as described.  Unfortunately the official
documentation isn't clear on the following.

Official documentation URL -
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)

1. How to migrate from default space_cache=v1 to space_cache=v2? It
    talks about the reverse, from v2 to v1!

Just mount with "space_cache=v2".

quoted

2. If we use space_cache=v2, is it indeed still the case that the
    "btrfs" command will NOT work with the filesystem?

Why would you think "btrfs" won't work on a btrfs?

Thanks,
Qu

quoted

  So will our
    "btrfs scrub start /mount/point/..." cron jobs FAIL?  I'm guessing
    the btrfs command comes from btrfs-progs which is currently v5.4.1-2
    amd64, is that correct?
3. Any other ideas on how we can get rid of those annoying pauses with
    large backups into the array?

Thanks in advance!

DP

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help