Thread (10 messages) 10 messages, 3 authors, 2023-08-01

Re: What is the status of GSoC 2022 work on making Git use roaring bitmaps?

From: Jakub Narębski <hidden>
Date: 2023-08-01 17:44:24

Hello

On Tue, 1 Aug 2023 at 19:34, Taylor Blau [off-list ref] wrote:
On Tue, Aug 01, 2023 at 01:34:32PM +0200, Jakub Narębski wrote:
quoted
On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys [off-list ref] wrote:
quoted
On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau [off-list ref] wrote:
quoted
I haven't proved conclusively one way or the other where Roaring+Run is
significantly faster than EWAH or vice-versa. There are some cases where
the former is a clear winner, and other cases where it's the latter.

In any event, my extremely WIP patches to make this mostly work are
available here:

  https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps
thanks. For anyone reading along, the changes to JGit are here

https://git.eclipse.org/r/c/jgit/jgit/+/203448

I was looking into this because I was hoping that roaring might
decrease peak memory usage.

I don't have firm evidence that it's better or worse, but I did
observe that runtime and memory usage during GC (which is heavy on
bitmap operations due to delta/xor encoding) was unchanged. That makes
me pessimistic that there are significant gains to be had.
The major advantage Roaring bitmaps have over EWAH and other
simple Run Length Encoding based compression algorithms is that
bitmap operations can be done on compressed bitmaps: there is no
need to uncompress bitmap to do (want1 OR want2 AND NOT have).
Yeah, this is definitely where the majority of CPU savings seems to
remain. The existing implementation in my branch is much too eager to
uncompress bitmaps when we need to perform a logical/binary operation on
them.
As I understand it, the current code in Git (in C implementation) uses
uncompressed bitmap to store the result of OR-ing, but it uses compressed
EWAH to perform <uncompressed result> OR <EWAH-compressed bitmap>.
I think with some more surgery we could leave bitmaps in a compressed
state for longer. I am not sure whether or not we should ever uncompress
the bitmaps, though it's possible that doing so is beneficial since
uncompressed bitmaps have better query performance (albeit more costly
memory usage).
I'm not sure if it would be worth the complexity, but supposedly you can
perform L bitwise OR operations in O(log(L)) instead of O(L). Current C code
computes OR operation sequentially, see above, and also
  https://github.com/git/git/blob/master/pack-bitmap.c#L103\0

There might be thousands of "wants" and of "haves", but is it common?

Best,
-- 
Jakub Narębski
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help