Thread (17 messages) 17 messages, 5 authors, 2022-10-31

Re: [PATCH 3/5] roaring: teach Git to write roaring bitmaps

From: Abhradeep Chakraborty <hidden>
Date: 2022-09-30 16:24:03

On Fri, Sep 30, 2022 at 11:51 AM Junio C Hamano [off-list ref] wrote:
"Abhradeep Chakraborty via GitGitGadget" [off-list ref]
writes:
quoted
From: Abhradeep Chakraborty <redacted>

Roaring bitmaps are said to be more efficient (most of the time) than
ewah bitmaps. So Git might gain some optimization if it support roaring
bitmaps. As Roaring library has all the changes it needed to implement
roaring bitmaps in Git, Git can learn to write roaring bitmaps. However,
all the changes are backward-compatible.

Teach Git to write roaring bitmaps.
That is way underexplained.   At least cover what the plans are, so
that readers do not have to ask these questions:

 * When is the choice of bitmap type is made?  Is it fixed at
   repository initialization time and once chosen other kinds cannot
   be used?

 * Is the bitmap file self describing?  How does a reader know
   between ewah and roaring codepaths to use to read a given bitmap
   file?  Is there enough room for extending the set of bitmap
   formats, or we cannot add other formats easily?
Hey Junio,

First of all, sorry that the next version is taking so much time to
land. We have a festival ("Durga Puja"; it is the biggest festival for
Bengalis) going on here now. So I am not that active.

I will explain briefly in the next version.
Do you really need the global variable that holds the bitmap type?

Wouldn't it be easier to write code that needs to deal with both
types (e.g. in a repository with existing ewah bitmap, you want to
do a repack and index the result using the roaring bitmap) if you
passed the type through the callchain as a parameter?
I didn't want to go for "passing the type through the callchain as a
parameter" because that would cause changes to every affected function
definition. I found the "global variable" approach simpler for this
reason. Here we have to initialize the type once and the affected
functions will work accordingly.

If you like the "callchain" approach, I have no problem to implement it.
It may be that the codepath that reads from an existing bitmap file
says "ah, the file given to us seems to be in format X (either EWAH
or ROARING or perhaps something else), so let's call bitmap_init(X)
to obtain the in-core data structure to deal with that file".  When
that happens, you may probably need to have two cases in the default:
arm of this switch statement, i.e. one to diagnose a BUG() to pass
an uninitialized bitmap type to the codepath, and the other to
diagnose a runtime error() to have read a bitmap file whose format
this version of Git does not understand.
Ok, understood. Thanks.
These repetitive patterns makes me wonder if void *bitmap
is a good type to be passing around.  Shouldn't it be a struct with
its first member being a bitmap_type, and another member being what
these functions are passing to the underlying bitmap format specific
functions as "bitmap"?  E.g.

    void bitmap_unset(struct bitmap *bm, uint32_t i)
    {
        switch (bm->type) {
        case EWAH:
                ewah_bitmap_remove(bm->u.ewah, i);
                break;
        ...
Good idea! Thanks.
quoted
+
+enum bitmap_type {
+     INIT_BITMAP_TYPE = 0,
"UNINITIALIZED_BITMAP_TYPE", probably.
Ok.
quoted
+void *roaring_or_ewah_bitmap_init(void);
I would strongly suggest reconsider these names.  What if you later
want to add the third variant?  roaring_or_ewah_or_xyzzy_bitmap_init()?

Instead just use the most generic name, like "bitmap_init", perhaps
something along the lines of ...

    struct bitmap {
        enum bitmap_type type;
        union {
            struct ewah_bitmap *ewah;
            struct roaring_bitmap *roaring;
        } u;
    };

    struct bitmap *bitmap_new(enum bitmap_type type)
    {
        struct bitmap *bm = xmalloc(sizeof(*bm));

        bm->type = type;
        switch (bm->type) {
        case EWAH:
            bm->u.ewah = ewah_new();
            break;
        case ROARING:
            bm->u.roaring = roaring_bitmap_create();
            break;
        default:
            die(_("unknown bitmap type %d"), (int)type);
        }
        return bm;
    }
Got it. It seems a better option than the current one.

Thanks )
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help