Thread (9 messages) 9 messages, 5 authors, 2021-02-19

Re: [PATCH] btrfs: make btrfs_dirty_inode() to always reserve metadata space

From: Qu Wenruo <hidden>
Date: 2021-02-19 00:21:17


On 2021/2/18 下午11:28, Nikolay Borisov wrote:

On 8.01.21 г. 7:36 ч., Qu Wenruo wrote:
quoted
There are several qgroup flush related bugs fixed recently, all of them
are caused by the fact that we can trigger qgroup metadata space
reservation holding a transaction handle.

Thankfully the only situation to trigger above reservation is
btrfs_dirty_inode().

Currently btrfs_dirty_inode() will try join transactio first, then
update the inode.
If btrfs_update_inode() fails with -ENOSPC, then it retry to start
transaction to reserve metadata space.

This not only forces us to reserve metadata space with a transaction
handle hold, but can't handle other errors like -EDQUOT.

This patch will make btrfs_dirty_inode() to call
btrfs_start_transaction() directly without first try joining then
starting, so that in try_flush_qgroup() we won't hold a trans handle.

This will slow down btrfs_dirty_inode() but my fstests doesn't show too
much different for most test cases, thus it may be worthy to skip such
performance "optimization".

Signed-off-by: Qu Wenruo <redacted>

Ok I actually run 2 tests against this patch. The first one is a 10
second run of  stress-ng's utime test (stress-ng --temp-path
/media/scratch --utime 4 -M -t 10 ; done) to see if I can reproduce
intel's results and here's what I found:


bogo ops/s real (Before-patch)	bogo ops/s real (After Patch)
	35993	                         32968
	35712	                         33146
	35369	                         32996
	35544	                         33159
	35623	                         33000
	35939	                         33016
	35693	                         32829
	35562	                         32685
	35675	                         32815
Std dev	182.161981912585	146.829034703967
HMean	35677.9600871036	32957.1111111111
Diff%:		                -7.626

So there's a 7.6% decrease in the rate of utime() calls we can make,
given that we now start a transaction I'd say that's expected.

The other test was a randwrite with fio as I was mostly worried that
making btrfs_dirty_inode more expensive would hit write performance
since file_update_times is called from the generic iter. But inspecting
the code btrfs uses update_time_for_write which doesn't dirty the inode
per-se as this is deferred to endio completion time.  I also measured
the impact during buffered read time as file_accessed is called a lot of
times but the following bpftrace script:

BEGIN {@execs = 0; }
kprobe:btrfs_dirty_inode
{
	@test[kstack] = count();
	@execs++;
}

kprobe:touch_atime
{
	@test[kstack] = count();
}
END{
	printf("Total btrfs_dirty_inode calls: %llu\n", @execs);
}


confirmed we only ever execute around 8 btrfs_dirty_inode out of 1048773
execution of touch_atimes from generic_file_buffered_read with the
following fio workload:

fio --name=random-readers --thread --ioengine=sync --iodepth=4
--rw=randread --bs=4k --direct=0 --size=1g --numjobs=4
--directory=/media/scratch --filename_format=FioWorkloads.\$jobnum
--new_group --group_reporting=1


So performance-wise I'm inclined to give it a "pass".
Great. Mind me to add such info into the commit message and add you as sob?

Thanks,
Qu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help