Thread (10 messages) 10 messages, 3 authors, 2023-08-01

Re: What is the status of GSoC 2022 work on making Git use roaring bitmaps?

From: Jakub Narębski <hidden>
Date: 2023-08-01 13:18:07

Hello,

On Tue, 1 Aug 2023 at 13:54, Han-Wen Nienhuys [off-list ref] wrote:
On Tue, Aug 1, 2023 at 1:35 PM Jakub Narębski [off-list ref] wrote:
quoted
On Tue, 1 Aug 2023 at 13:26, Han-Wen Nienhuys [off-list ref] wrote:
quoted
On Mon, Jul 31, 2023 at 10:18 PM Taylor Blau [off-list ref] wrote:
quoted
I haven't proved conclusively one way or the other where Roaring+Run is
significantly faster than EWAH or vice-versa. There are some cases where
the former is a clear winner, and other cases where it's the latter.

In any event, my extremely WIP patches to make this mostly work are
available here:

  https://github.com/ttaylorr/git/compare/tb/roaring-bitmaps
thanks. For anyone reading along, the changes to JGit are here

https://git.eclipse.org/r/c/jgit/jgit/+/203448

I was looking into this because I was hoping that roaring might
decrease peak memory usage.

I don't have firm evidence that it's better or worse, but I did
observe that runtime and memory usage during GC (which is heavy on
bitmap operations due to delta/xor encoding) was unchanged. That makes
me pessimistic that there are significant gains to be had.
The major advantage Roaring bitmaps have over EWAH and other
simple Run Length Encoding based compression algorithms is that
bitmap operations can be done on compressed bitmaps: there is no
need to uncompress bitmap to do (want1 OR want2 AND NOT have).
Are you sure? The source code for and and andNot look rather similar
in that they seem to do operations on whole RLE sections at a time,

https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L498

https://sourcegraph.com/github.com/lemire/javaewah/-/blob/src/main/java/com/googlecode/javaewah/EWAHCompressedBitmap.java?L405

Looking at the EWAH format as documented for git-bitmap-format, EWAH
allows for RLE on both 1s and 0s. It should be possible to efficiently
clear out a section of the target if the second operand of andNot has
RLE encoded run of 1s.
You are right, I seem to have misremembered the statement from the
EWAH paper.

Lemma 2 in "Sorting improves word-aligned bitmap indexes" (arXiv:0901.3751v7)
states that the bitmap operation of L bitmaps is computable in
O(L*compressed size),
and for updatable L-ary operation like symmetric boolean operation
the bitmap operation is computable in O(log(L)*compressed size).

Now I am not sure if I understand the code of Git correctly, but it seems like
in pack-bitmap.c the `add_commit_to_bitmap()` function stores the result
of OR operation on bitmaps as an uncompressed bitmap:
https://github.com/git/git/blob/master/pack-bitmap.c#L1030

Best,
-- 
Jakub Narębski
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help