Re: [PATCH] btrfs: make btrfs_dirty_inode() to always reserve metadata space
From: Qu Wenruo <hidden>
Date: 2021-02-19 00:21:17
On 2021/2/18 下午11:28, Nikolay Borisov wrote:
On 8.01.21 г. 7:36 ч., Qu Wenruo wrote:quoted
There are several qgroup flush related bugs fixed recently, all of them are caused by the fact that we can trigger qgroup metadata space reservation holding a transaction handle. Thankfully the only situation to trigger above reservation is btrfs_dirty_inode(). Currently btrfs_dirty_inode() will try join transactio first, then update the inode. If btrfs_update_inode() fails with -ENOSPC, then it retry to start transaction to reserve metadata space. This not only forces us to reserve metadata space with a transaction handle hold, but can't handle other errors like -EDQUOT. This patch will make btrfs_dirty_inode() to call btrfs_start_transaction() directly without first try joining then starting, so that in try_flush_qgroup() we won't hold a trans handle. This will slow down btrfs_dirty_inode() but my fstests doesn't show too much different for most test cases, thus it may be worthy to skip such performance "optimization". Signed-off-by: Qu Wenruo <redacted>Ok I actually run 2 tests against this patch. The first one is a 10 second run of stress-ng's utime test (stress-ng --temp-path /media/scratch --utime 4 -M -t 10 ; done) to see if I can reproduce intel's results and here's what I found: bogo ops/s real (Before-patch) bogo ops/s real (After Patch) 35993 32968 35712 33146 35369 32996 35544 33159 35623 33000 35939 33016 35693 32829 35562 32685 35675 32815 Std dev 182.161981912585 146.829034703967 HMean 35677.9600871036 32957.1111111111 Diff%: -7.626 So there's a 7.6% decrease in the rate of utime() calls we can make, given that we now start a transaction I'd say that's expected. The other test was a randwrite with fio as I was mostly worried that making btrfs_dirty_inode more expensive would hit write performance since file_update_times is called from the generic iter. But inspecting the code btrfs uses update_time_for_write which doesn't dirty the inode per-se as this is deferred to endio completion time. I also measured the impact during buffered read time as file_accessed is called a lot of times but the following bpftrace script: BEGIN {@execs = 0; } kprobe:btrfs_dirty_inode { @test[kstack] = count(); @execs++; } kprobe:touch_atime { @test[kstack] = count(); } END{ printf("Total btrfs_dirty_inode calls: %llu\n", @execs); } confirmed we only ever execute around 8 btrfs_dirty_inode out of 1048773 execution of touch_atimes from generic_file_buffered_read with the following fio workload: fio --name=random-readers --thread --ioengine=sync --iodepth=4 --rw=randread --bs=4k --direct=0 --size=1g --numjobs=4 --directory=/media/scratch --filename_format=FioWorkloads.\$jobnum --new_group --group_reporting=1 So performance-wise I'm inclined to give it a "pass".
Great. Mind me to add such info into the commit message and add you as sob? Thanks, Qu