Re: migrating to space_cache=2 and btrfs userspace commands

From: Joshua <hidden>
Date: 2021-07-15 18:00:05

Just as a point of data, I have a 96 TB array with RAID1 data, and RAID1C3 metadata.

I made the switch to space_cache=v2 some time ago, and I remember it made a huge difference when I did so!
(It was RAID1 metadata at the time, as RAID1C3 was not available at the time.)


However, I also tried a check with '--clear-space-cache v1' at the time, and after waiting a literal whole day without it completing, I gave up, canceled it, and put it back into production.  Is a --clear-space-cache v1 operation expected to take so long on such a large file system?

Thanks!
--Joshua Villwock



July 15, 2021 9:40 AM, "DanglingPointer" [off-list ref] wrote:

Hi Qu,

Just updating here that setting the mount option "space_cache=v2" and "noatime" completely SOLVED
the performance problem!
Basically like night and day!

These are my full fstab mount options...

btrfs defaults,autodefrag,space_cache=v2,noatime 0 2

Perhaps defaulting the space_cache=v2 should be considered?  Why default to v1, what's the value of
v1?

So for conclusion, for large multi-terrabyte arrays (in my case RAID5s), setting space_cache=v2 and
noatime massively increases performance and eliminates the large long pauses in frequent intervals
by "btrfs-transacti" blocking all IO.

Thanks Qu for your help!

On 14/7/21 5:45 pm, Qu Wenruo wrote:

quoted

On 2021/7/14 下午3:18, DanglingPointer wrote:

quoted

a) "echo l > /proc/sysrq-trigger"

The backup finished today already unfortunately and we are unlikely to
run it again until we get an outage to remount the array with the
space_cache=v2 and noatime mount options.
Thanks for the command, we'll definitely use it if/when it happens again
on the next large migration of data.

Just to avoid confusion, after that command, "dmesg" output is still
needed, as that's where sysrq put its output.

quoted

b) "sudo btrfs qgroup show -prce" ........

$ ERROR: can't list qgroups: quotas not enabled

So looks like it isn't enabled.

One less thing to bother.

quoted

File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy
backup defaults)

Between 1~16MiB, thus tons of small files.

Btrfs is not really good at handling tons of small files, as they
generate a lot of metadata.

That may contribute to the hang.

quoted

What classifies as a transaction?

It's a little complex.

Technically it's a check point where before the checkpoint, all you see
is old data, after the checkpoint, all you see is new data.

To end users, any data and metadata write will be included into one
transaction (with proper dependency handled).

One way to finish (or commit) current transaction is to sync the fs,
using "sync" command (sync all filesystems).

quoted

Any/All writes done in a 30sec
interval?

This the default commit interval. Almost all fses will try to commit its
data/metadata to disk after a configurable interval.

The default one is 30s. That's also one way to commit current > transaction.

quoted

If 100 unique files were written in 30secs, is that 1
transaction or 100 transactions?

It depends. As things like syncfs() and subvolume/snapshot creation may
try to commit transaction.

But without those special operations, just writing 100 unique files
using buffered write, it would only start one transaction, and when the
30s interval get hit, the transaction will be committed to disk.

quoted

Millions of files of the size range
above were backed up.

The amount of files may not force a transaction commit, if it doesn't
trigger enough memory pressure, or free space pressure.

Anyway, the "echo l" sysrq would help us to locate what's taking so long
time.

quoted

c) "Just mount with "space_cache=v2""

Ok so no need to "clear_cache" the v1 cache, right?

Yes, and "clear_cache" won't really remove all the v1 cache anyway.

Thus it doesn't help much.

The only way to fully clear v1 cache is by using "btrfs check
--clear-space-cache v1" on a *unmounted* btrfs.

quoted

I wrote this in the fstab but hadn't remounted yet until I can get an
outage....

IMHO if you really want to test if v2 would help, you can just remount,
no need to wait for a break.

Thanks,
Qu

quoted

..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime  0  2 >
Thanks again for your help Qu!

On 14/7/21 2:59 pm, Qu Wenruo wrote:

On 2021/7/13 下午11:38, DanglingPointer wrote:
We're currently considering switching to "space_cache=v2" with noatime
mount options for my lab server-workstations running RAID5.

Btrfs RAID5 is unsafe due to its write-hole problem.

* One has 13TB of data/metadata in a bunch of 6TB and 2TB disks
totalling 26TB.
* Another has about 12TB data/metadata in uniformly sized 6TB disks
totalling 24TB.
* Both of the arrays are on individually luks encrypted disks with
btrfs on top of the luks.
* Both have "defaults,autodefrag" turned on in fstab.

We're starting to see large pauses during constant backups of millions
of chunk files (using duplicacy backup) in the 24TB array.

Pauses sometimes take up to 20+ seconds in frequencies after every
~30secs of the end of the last pause.  "btrfs-transacti" process
consistently shows up as the blocking process/thread locking up
filesystem IO.  IO gets into the RAID5 array via nfsd. There are no >>>> disk
or btrfs errors recorded.  scrub last finished yesterday successfully.

Please provide the "echo l > /proc/sysrq-trigger" output when such >>> pause
happens.

If you're using qgroup (may be enabled by things like snapper), it may
be the cause, as qgroup does its accounting when committing >>> transaction.

If one transaction is super large, it can cause such problem.

You can test if qgroup is enabled by:

# btrfs qgroup show -prce <mnt>

After doing some research around the internet, we've come to the
consideration above as described.  Unfortunately the official
documentation isn't clear on the following.

Official documentation URL -
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)

1. How to migrate from default space_cache=v1 to space_cache=v2? It
talks about the reverse, from v2 to v1!

Just mount with "space_cache=v2".

2. If we use space_cache=v2, is it indeed still the case that the
"btrfs" command will NOT work with the filesystem?

Why would you think "btrfs" won't work on a btrfs?

Thanks,
Qu

So will our
"btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing
the btrfs command comes from btrfs-progs which is currently >>>> v5.4.1-2
amd64, is that correct?
3. Any other ideas on how we can get rid of those annoying pauses with
large backups into the array?

Thanks in advance!

DP

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help