Thread (33 messages) 33 messages, 9 authors, 2021-12-15

Re: [PATCH 0/9] lib/bitmap: optimize bitmap_weight() usage

From: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Date: 2021-11-29 16:36:35
Also in: kvm, linux-alpha, linux-crypto, linux-mips, linux-mm, linux-perf-users, linux-riscv, linux-s390, lkml

Dnia 29 listopada 2021 06:38:39 UTC, Yury Norov [off-list ref] napisał/a:
On Sun, Nov 28, 2021 at 07:03:41PM +0100, mirq-test@rere.qmqm.pl wrote:
quoted
On Sat, Nov 27, 2021 at 07:56:55PM -0800, Yury Norov wrote:
quoted
In many cases people use bitmap_weight()-based functions like this:

	if (num_present_cpus() > 1)
		do_something();

This may take considerable amount of time on many-cpus machines because
num_present_cpus() will traverse every word of underlying cpumask
unconditionally.

We can significantly improve on it for many real cases if stop traversing
the mask as soon as we count present cpus to any number greater than 1:

	if (num_present_cpus_gt(1))
		do_something();

To implement this idea, the series adds bitmap_weight_{eq,gt,le}
functions together with corresponding wrappers in cpumask and nodemask.
Having slept on it I have more structured thoughts:

First, I like substituting bitmap_empty/full where possible - I think
the change stands on its own, so could be split and sent as is.
Ok, I can do it.
quoted
I don't like the proposed API very much. One problem is that it hides
the comparison operator and makes call sites less readable:

	bitmap_weight(...) > N

becomes:

	bitmap_weight_gt(..., N)

and:
	bitmap_weight(...) <= N

becomes:

	bitmap_weight_lt(..., N+1)
or:
	!bitmap_weight_gt(..., N)

I'd rather see something resembling memcmp() API that's known enough
to be easier to grasp. For above examples:

	bitmap_weight_cmp(..., N) > 0
	bitmap_weight_cmp(..., N) <= 0
	...
bitmap_weight_cmp() cannot be efficient. Consider this example:

bitmap_weight_lt(1000 0000 0000 0000, 1) == false
                ^
                stop here

bitmap_weight_cmp(1000 0000 0000 0000, 1) == 0
                                ^
                                stop here

I agree that '_gt' is less verbose than '>', but the advantage of 
'_gt' over '>' is proportional to length of bitmap, and it means
that this API should exist.
Thank you for the example. Indeed, for less-than to be efficient here you would need to replace
 bitmap_weight_cmp(..., N) < 0
with
 bitmap_weight_cmp(..., N-1) <= 0

It would still be more readable, I think.

Best Regards
Michał Mirosław
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help