Thread (19 messages) 19 messages, 2 authors, 2023-09-15

Re: [RFC PATCH 9/9] iov_iter: Add benchmarking kunit tests for UBUF/IOVEC

From: David Howells <dhowells@redhat.com>
Date: 2023-09-15 12:20:52
Also in: linux-block, linux-fsdevel, linux-kselftest, linux-mm, lkml

David Laight [off-list ref] wrote:
Isn't that going to be completely dominated by the cache fills
from memory?

I'd have thought you'd need to use something with a lot of
small fragments so that the iteration code dominates the copy.
Okay, if I switch it to using MAP_ANON for the big 256MiB buffer, switch all
the benchmarking tests to use copy_from_iter() rather than copy_to_iter() and
make the iovec benchmark use a separate iovec for each page, there's then a
single page replicated across the mapping.

Given that, without my macro-to-inline-func patches applied, I see:

	iov_kunit_benchmark_bvec: avg 3184 uS, stddev 16 uS
	iov_kunit_benchmark_bvec: avg 3189 uS, stddev 17 uS
	iov_kunit_benchmark_bvec: avg 3190 uS, stddev 16 uS
	iov_kunit_benchmark_bvec_outofline: avg 3731 uS, stddev 10 uS
	iov_kunit_benchmark_bvec_outofline: avg 3735 uS, stddev 10 uS
	iov_kunit_benchmark_bvec_outofline: avg 3738 uS, stddev 11 uS
	iov_kunit_benchmark_bvec_split: avg 3403 uS, stddev 10 uS
	iov_kunit_benchmark_bvec_split: avg 3405 uS, stddev 18 uS
	iov_kunit_benchmark_bvec_split: avg 3407 uS, stddev 29 uS
	iov_kunit_benchmark_iovec: avg 6616 uS, stddev 20 uS
	iov_kunit_benchmark_iovec: avg 6619 uS, stddev 22 uS
	iov_kunit_benchmark_iovec: avg 6621 uS, stddev 46 uS
	iov_kunit_benchmark_kvec: avg 2671 uS, stddev 12 uS
	iov_kunit_benchmark_kvec: avg 2671 uS, stddev 13 uS
	iov_kunit_benchmark_kvec: avg 2675 uS, stddev 12 uS
	iov_kunit_benchmark_ubuf: avg 6191 uS, stddev 1946 uS
	iov_kunit_benchmark_ubuf: avg 6418 uS, stddev 3263 uS
	iov_kunit_benchmark_ubuf: avg 6443 uS, stddev 3275 uS
	iov_kunit_benchmark_xarray: avg 3689 uS, stddev 5 uS
	iov_kunit_benchmark_xarray: avg 3689 uS, stddev 6 uS
	iov_kunit_benchmark_xarray: avg 3698 uS, stddev 22 uS
	iov_kunit_benchmark_xarray_outofline: avg 4202 uS, stddev 3 uS
	iov_kunit_benchmark_xarray_outofline: avg 4204 uS, stddev 9 uS
	iov_kunit_benchmark_xarray_outofline: avg 4210 uS, stddev 9 uS

and with, I get:

	iov_kunit_benchmark_bvec: avg 3241 uS, stddev 13 uS
	iov_kunit_benchmark_bvec: avg 3245 uS, stddev 16 uS
	iov_kunit_benchmark_bvec: avg 3248 uS, stddev 15 uS
	iov_kunit_benchmark_bvec_outofline: avg 3705 uS, stddev 12 uS
	iov_kunit_benchmark_bvec_outofline: avg 3706 uS, stddev 10 uS
	iov_kunit_benchmark_bvec_outofline: avg 3709 uS, stddev 9 uS
	iov_kunit_benchmark_bvec_split: avg 3446 uS, stddev 10 uS
	iov_kunit_benchmark_bvec_split: avg 3447 uS, stddev 12 uS
	iov_kunit_benchmark_bvec_split: avg 3448 uS, stddev 12 uS
	iov_kunit_benchmark_iovec: avg 6587 uS, stddev 22 uS
	iov_kunit_benchmark_iovec: avg 6587 uS, stddev 22 uS
	iov_kunit_benchmark_iovec: avg 6590 uS, stddev 27 uS
	iov_kunit_benchmark_kvec: avg 2671 uS, stddev 12 uS
	iov_kunit_benchmark_kvec: avg 2672 uS, stddev 12 uS
	iov_kunit_benchmark_kvec: avg 2676 uS, stddev 19 uS
	iov_kunit_benchmark_ubuf: avg 6241 uS, stddev 2199 uS
	iov_kunit_benchmark_ubuf: avg 6266 uS, stddev 2245 uS
	iov_kunit_benchmark_ubuf: avg 6513 uS, stddev 3899 uS
	iov_kunit_benchmark_xarray: avg 3695 uS, stddev 6 uS
	iov_kunit_benchmark_xarray: avg 3695 uS, stddev 7 uS
	iov_kunit_benchmark_xarray: avg 3703 uS, stddev 11 uS
	iov_kunit_benchmark_xarray_outofline: avg 4215 uS, stddev 16 uS
	iov_kunit_benchmark_xarray_outofline: avg 4217 uS, stddev 20 uS
	iov_kunit_benchmark_xarray_outofline: avg 4224 uS, stddev 10 uS

Interestingly, most of them are quite tight, but UBUF is all over the place.
That's with the test covering the entire 256M span with a single UBUF
iterator, so it would seem unlikely that the difference is due to the
iteration framework.

David
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help