From: David Howells
Sent: 15 September 2023 11:10
David Laight [off-list ref] wrote:
quoted
quoted
Add kunit tests to benchmark 256MiB copies to a UBUF iterator and an IOVEC
iterator. This attaches a userspace VM with a mapped file in it
temporarily to the test thread.
Isn't that going to be completely dominated by the cache fills
from memory?
Yes... but it should be consistent in the amount of time that consumes since
no device drivers are involved. I can try adding the same folio to the
anon_file multiple times - it might work especially if I don't put the pages
on the LRU (if that's even possible) - but I wanted separate pages for the
extraction test.
You could also just not do the copy!
Although you need (say) asm volatile("\n",:::"memory") to
stop it all being completely optimised away.
That might show up a difference in the 'out_of_line' test
where 15% on top on the data copies is massive - it may be
that the data cache behaviour is very different for the
two cases.
...quoted
Some measurements can be made using readv() and writev()
on /dev/zero and /dev/null.
Forget /dev/null; that doesn't actually engage any iteration code. The same
for writing to /dev/zero. Reading from /dev/zero does its own iteration thing
rather than using iterate_and_advance(), presumably because it checks for
signals and resched.
Using /dev/null does exercise the 'copy iov from user' code.
Last time I looked at that the 32bit compat code was faster than
the 64bit code on x86!
David
-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)