Re: Unexpected reflink/subvol snapshot behaviour

From: Dave Chinner <david@fromorbit.com>
Date: 2021-02-02 06:03:59
Also in: linux-xfs

On Mon, Feb 01, 2021 at 06:14:21PM -0800, Darrick J. Wong wrote:

On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote:

quoted

Hi btrfs-gurus,

I'm running a simple reflink/snapshot/COW scalability test at the
moment. It is just a loop that does "fio overwrite of 10,000 4kB
random direct IOs in a 4GB file; snapshot" and I want to check a
couple of things I'm seeing with btrfs. fio config file is appended
to the email.

Firstly, what is the expected "space amplification" of such a
workload over 1000 iterations on btrfs? This will write 40GB of user
data, and I'm seeing btrfs consume ~220GB of space for the workload
regardless of whether I use subvol snapshot or file clones
(reflink).  That's a space amplification of ~5.5x (a lot!) so I'm
wondering if this is expected or whether there's something else
going on. XFS amplification for 1000 iterations using reflink is
only 1.4x, so 5.5x seems somewhat excessive to me.

On a similar note, the IO bandwidth consumed by btrfs is way out of
proportion with the amount of user data being written. I'm seeing
multiple GBs being written by btrfs on every iteration - easily
exceeding 5GB of writes per cycle in the later iterations of the
test. Given that only 40MB of user data is being written per cycle,
there's a write amplification factor of well over 100x ocurring
here. In comparison, XFS is writing roughly consistently at 80MB/s
to disk over the course of the entire workload, largely because of
journal traffic for the transactions run during COW and clone
operations.  Is such a huge amount of of IO expected for btrfs in
this situation?

<just gonna snip this part>

quoted

FYI, I've compared btrfs reflink to XFS reflink, too, and XFS fio
performance stays largely consistent across all 1000 iterations at
around 13-14k +/-2k IOPS. The reflink time also scales linearly with
the number of extents in the source file and levels off at about
10-11s per cycle as the extent count in the source file levels off
at ~850,000 extents. XFS completes the 1000 iterations of
write/clone in about 4 hours, btrfs completels the same part of the
workload in about 9 hours.

Just out of curiosity, do any of the patches in [1] improve those
numbers for xfs?  As you noted a long time ago, the transaction
reservations are kind of huge, so I fixed those and shook out a few
other warts while I was at it.

I'll give it a spin, but my initial reaction is "I don't think so".
The workload is does not have the concurrency necessary to be
sensitive to log reservation space running out...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help