Thread (16 messages) 16 messages, 4 authors, 2021-07-16

Re: migrating to space_cache=2 and btrfs userspace commands

From: Qu Wenruo <hidden>
Date: 2021-07-15 22:13:18


On 2021/7/16 上午12:40, DanglingPointer wrote:
Hi Qu,

Just updating here that setting the mount option "space_cache=v2" and
"noatime" completely SOLVED the performance problem!
Basically like night and day!


These are my full fstab mount options...

btrfs defaults,autodefrag,space_cache=v2,noatime 0 2


Perhaps defaulting the space_cache=v2 should be considered?
We're already considering that.
 Why default
to v1, what's the value of v1?
One of the problem in the past is the lack of write ability in btrfs-progs.

Now we're testing default it in mkfs.btrfs.

Thanks,
Qu

So for conclusion, for large multi-terrabyte arrays (in my case RAID5s),
setting space_cache=v2 and noatime massively increases performance and
eliminates the large long pauses in frequent intervals by
"btrfs-transacti" blocking all IO.

Thanks Qu for your help!



On 14/7/21 5:45 pm, Qu Wenruo wrote:
quoted

On 2021/7/14 下午3:18, DanglingPointer wrote:
quoted
a) "echo l > /proc/sysrq-trigger"

The backup finished today already unfortunately and we are unlikely to
run it again until we get an outage to remount the array with the
space_cache=v2 and noatime mount options.
Thanks for the command, we'll definitely use it if/when it happens again
on the next large migration of data.
Just to avoid confusion, after that command, "dmesg" output is still
needed, as that's where sysrq put its output.
quoted

b) "sudo btrfs qgroup show -prce" ........

$ ERROR: can't list qgroups: quotas not enabled

So looks like it isn't enabled.
One less thing to bother.
quoted
File sizes are between: 1,048,576 bytes and 16,777,216 bytes (Duplicacy
backup defaults)
Between 1~16MiB, thus tons of small files.

Btrfs is not really good at handling tons of small files, as they
generate a lot of metadata.

That may contribute to the hang.
quoted
What classifies as a transaction?
It's a little complex.

Technically it's a check point where before the checkpoint, all you see
is old data, after the checkpoint, all you see is new data.

To end users, any data and metadata write will be included into one
transaction (with proper dependency handled).

One way to finish (or commit) current transaction is to sync the fs,
using "sync" command (sync all filesystems).
quoted
Any/All writes done in a 30sec
interval?
This the default commit interval. Almost all fses will try to commit its
data/metadata to disk after a configurable interval.

The default one is 30s. That's also one way to commit current
transaction.
quoted
  If 100 unique files were written in 30secs, is that 1
transaction or 100 transactions?
It depends. As things like syncfs() and subvolume/snapshot creation may
try to commit transaction.

But without those special operations, just writing 100 unique files
using buffered write, it would only start one transaction, and when the
30s interval get hit, the transaction will be committed to disk.
quoted
  Millions of files of the size range
above were backed up.
The amount of files may not force a transaction commit, if it doesn't
trigger enough memory pressure, or free space pressure.

Anyway, the "echo l" sysrq would help us to locate what's taking so long
time.
quoted

c) "Just mount with "space_cache=v2""

Ok so no need to "clear_cache" the v1 cache, right?
Yes, and "clear_cache" won't really remove all the v1 cache anyway.

Thus it doesn't help much.

The only way to fully clear v1 cache is by using "btrfs check
--clear-space-cache v1" on a *unmounted* btrfs.
quoted
I wrote this in the fstab but hadn't remounted yet until I can get an
outage....
IMHO if you really want to test if v2 would help, you can just remount,
no need to wait for a break.

Thanks,
Qu
quoted
..."btrfs defaults,autodefrag,clear_cache,space_cache=v2,noatime  0  2 >
Thanks again for your help Qu!

On 14/7/21 2:59 pm, Qu Wenruo wrote:
quoted

On 2021/7/13 下午11:38, DanglingPointer wrote:
quoted
We're currently considering switching to "space_cache=v2" with noatime
mount options for my lab server-workstations running RAID5.
Btrfs RAID5 is unsafe due to its write-hole problem.
quoted
  * One has 13TB of data/metadata in a bunch of 6TB and 2TB disks
    totalling 26TB.
  * Another has about 12TB data/metadata in uniformly sized 6TB disks
    totalling 24TB.
  * Both of the arrays are on individually luks encrypted disks with
    btrfs on top of the luks.
  * Both have "defaults,autodefrag" turned on in fstab.

We're starting to see large pauses during constant backups of millions
of chunk files (using duplicacy backup) in the 24TB array.

Pauses sometimes take up to 20+ seconds in frequencies after every
~30secs of the end of the last pause.  "btrfs-transacti" process
consistently shows up as the blocking process/thread locking up
filesystem IO.  IO gets into the RAID5 array via nfsd. There are no
disk
or btrfs errors recorded.  scrub last finished yesterday successfully.
Please provide the "echo l > /proc/sysrq-trigger" output when such
pause
happens.

If you're using qgroup (may be enabled by things like snapper), it may
be the cause, as qgroup does its accounting when committing
transaction.

If one transaction is super large, it can cause such problem.

You can test if qgroup is enabled by:

# btrfs qgroup show -prce <mnt>
quoted
After doing some research around the internet, we've come to the
consideration above as described.  Unfortunately the official
documentation isn't clear on the following.

Official documentation URL -
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)

1. How to migrate from default space_cache=v1 to space_cache=v2? It
    talks about the reverse, from v2 to v1!
Just mount with "space_cache=v2".
quoted
2. If we use space_cache=v2, is it indeed still the case that the
    "btrfs" command will NOT work with the filesystem?
Why would you think "btrfs" won't work on a btrfs?

Thanks,
Qu
quoted
  So will our
    "btrfs scrub start /mount/point/..." cron jobs FAIL? I'm guessing
    the btrfs command comes from btrfs-progs which is currently
v5.4.1-2
    amd64, is that correct?
3. Any other ideas on how we can get rid of those annoying pauses with
    large backups into the array?

Thanks in advance!

DP
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help