Thread (26 messages) 26 messages, 3 authors, 2021-08-30

Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()

From: Petr Mladek <pmladek@suse.com>
Date: 2021-08-30 12:13:05
Also in: kvm, linux-arch, linux-kselftest, linux-mm, linux-mmc, lkml

On Thu 2021-08-26 14:09:55, Yury Norov wrote:
On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
quoted
On Sat 2021-08-14 14:17:07, Yury Norov wrote:
quoted
The macros iterate thru all set/clear bits in a bitmap. They search a
first bit using find_first_bit(), and the rest bits using find_next_bit().

Since find_next_bit() is called shortly after find_first_bit(), we can
save few lines of I-cache by not using find_first_bit().
Is this only a speculation or does it fix a real performance problem?

The macro is used like:

	for_each_set_bit(bit, addr, size) {
		fn(bit);
	}

IMHO, the micro-opimization does not help when fn() is non-trivial.
 
The effect is measurable:

Start testing for_each_bit()
for_each_set_bit:                15296 ns,   1000 iterations
for_each_set_bit_from:           15225 ns,   1000 iterations

Start testing for_each_bit() with cash flushing
for_each_set_bit:               547626 ns,   1000 iterations
for_each_set_bit_from:          497899 ns,   1000 iterations

Refer this:

https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
I see. The results look convincing on the first look.

But I am still not sure. This patch is basically contradicting many
other patches from this patchset:

  + 5th patch optimizes find_first_and_bit() and proves that it is
    much faster:

    Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
    Start testing find_bit() with random-filled bitmap
    [  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
    Start testing find_bit() with sparse bitmap
    [  140.295028] find_first_and_bit:               7103 ns,      1 iterations

    After:
    Start testing find_bit() with random-filled bitmap
    [  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
    Start testing find_bit() with sparse bitmap
    [  162.578458] find_first_and_bit:               4900 ns,      1 iterations

       => saves 46% in random bitmap
	  saves 31% in sparse bitmap


  + 6th, 7th, and 9th patch makes the code use find_first_bit()
    because it is faster than find_next_bit(mask, size, 0);

  + Now, 11th (this) patch replaces find_first_bit() with
    find_next_bit(mask, size, 0) because find_first_bit()
    makes things slower. It is suspicious at minimum.


By other words. The I-cache could safe 10% in one case.
But find_first_bit() might safe 46% in random case.

Does I-cache cost more than the faster code?

Or was for_each_set_bit() tested only with a bitmap
where find_first_bit() optimization did not help much?

How would for_each_set_bit() work with random bitmap?
How does it work with larger bitmaps?

Best Regards,
Petr
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help