Re: Unexpected reflink/subvol snapshot behaviour
From: Dave Chinner <david@fromorbit.com>
Date: 2021-02-02 06:03:59
Also in:
linux-xfs
On Mon, Feb 01, 2021 at 06:14:21PM -0800, Darrick J. Wong wrote:
On Fri, Jan 22, 2021 at 09:20:51AM +1100, Dave Chinner wrote:quoted
Hi btrfs-gurus, I'm running a simple reflink/snapshot/COW scalability test at the moment. It is just a loop that does "fio overwrite of 10,000 4kB random direct IOs in a 4GB file; snapshot" and I want to check a couple of things I'm seeing with btrfs. fio config file is appended to the email. Firstly, what is the expected "space amplification" of such a workload over 1000 iterations on btrfs? This will write 40GB of user data, and I'm seeing btrfs consume ~220GB of space for the workload regardless of whether I use subvol snapshot or file clones (reflink). That's a space amplification of ~5.5x (a lot!) so I'm wondering if this is expected or whether there's something else going on. XFS amplification for 1000 iterations using reflink is only 1.4x, so 5.5x seems somewhat excessive to me. On a similar note, the IO bandwidth consumed by btrfs is way out of proportion with the amount of user data being written. I'm seeing multiple GBs being written by btrfs on every iteration - easily exceeding 5GB of writes per cycle in the later iterations of the test. Given that only 40MB of user data is being written per cycle, there's a write amplification factor of well over 100x ocurring here. In comparison, XFS is writing roughly consistently at 80MB/s to disk over the course of the entire workload, largely because of journal traffic for the transactions run during COW and clone operations. Is such a huge amount of of IO expected for btrfs in this situation?<just gonna snip this part>quoted
FYI, I've compared btrfs reflink to XFS reflink, too, and XFS fio performance stays largely consistent across all 1000 iterations at around 13-14k +/-2k IOPS. The reflink time also scales linearly with the number of extents in the source file and levels off at about 10-11s per cycle as the extent count in the source file levels off at ~850,000 extents. XFS completes the 1000 iterations of write/clone in about 4 hours, btrfs completels the same part of the workload in about 9 hours.Just out of curiosity, do any of the patches in [1] improve those numbers for xfs? As you noted a long time ago, the transaction reservations are kind of huge, so I fixed those and shook out a few other warts while I was at it.
I'll give it a spin, but my initial reaction is "I don't think so". The workload is does not have the concurrency necessary to be sensitive to log reservation space running out... Cheers, Dave. -- Dave Chinner david@fromorbit.com