Thread (5 messages) 5 messages, 4 authors, 2021-06-30

Re: [PATCH] btrfs: Stop kmalloc'ing btrfs_path in btrfs_lookup_bio_sums

From: Nikolay Borisov <hidden>
Date: 2021-06-30 07:39:33


On 29.06.21 г. 14:56, Qu Wenruo wrote:

On 2021/6/28 下午11:06, Nikolay Borisov wrote:
quoted
While doing some performance characterization of a workoad reading ~80g
split among 4mb files via DIO I observed that btrfs_lookup_bio_sums was
using rather excessive cpu cycles:

95.97%--ksys_pread64
95.96%--new_sync_read
95.95%--iomap_dio_rw
95.89%--iomap_apply
72.25%--iomap_dio_bio_actor
55.19%--iomap_dio_submit_bio
55.13%--btrfs_submit_direct
20.72%--btrfs_lookup_bio_sums
7.44%-- kmem_cache_alloc

Timing the workload yielded:
real 4m41.825s
user 0m14.321s
sys 10m38.419s

Turns out this kmem_cache_alloc corresponds to the btrfs_path allocation
that happens in btrfs_lookup_bio_sums. Nothing in btrfs_lookup_bio_sums
warrants having the path be heap allocated. Just turn this allocation
into a stack based one. After the change performance changes
accordingly:

real 4m21.742s
user 0m13.679s
sys 9m8.889s

All in all there's a 20 seconds (~7%) reductino in real run time as well
as the time spent in the kernel is reduced by 1:30 minutes (~14%). And
btrfs_lookup_bio_sums ends up accounting for only 7.91% of cycles rather
than 20% before.

Signed-off-by: Nikolay Borisov <redacted>
I'm surprised that btrfs_path in kernel no longer needs any initialization.

I guess we can do a lot of more such conversion to use stack btrfs_path,
just like what we did in btrfs-progs.

But I'm not that sure if 112 bytes stack memory usage increase is OK or
not for other locations.

But since this one get a pretty good performance increase, and this
function itself doesn't have much stack memory usage anyway, it looks ok
to me.

Reviewed-by: Qu Wenruo <redacted>
Actually this patch makes most sense when SLUB debugging is enabled as
it would seem btrfs is hit pretty hard when debugging is enabled. IN
real world with SLUB debugging disabled it didn't make any noticeable
difference. In this case the fact that I'm adding 112 bytes on the stack
might bring more harm than good so I guess this patch can be dropped.

<snip>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help