Re: [PATCH 5/8] pack-bitmap: cache object positions during fill

[PATCH 0/8] pack-bitmap-write: speed up bitmap generation · Taylor Blau <hidden> · 2026-05-19
[PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` · Jeff King <hidden> · 2026-05-27
Re: [PATCH 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` · Taylor Blau <hidden> · 2026-05-27
[PATCH 2/8] pack-bitmap: check subtree bits before recursing · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 2/8] pack-bitmap: check subtree bits before recursing · Jeff King <hidden> · 2026-05-27
Re: [PATCH 2/8] pack-bitmap: check subtree bits before recursing · Taylor Blau <hidden> · 2026-05-27
[PATCH 3/8] pack-bitmap: reuse stored selected bitmaps · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps · Jeff King <hidden> · 2026-05-27
Re: [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps · Taylor Blau <hidden> · 2026-05-27
Re: [PATCH 3/8] pack-bitmap: reuse stored selected bitmaps · Jeff King <hidden> · 2026-05-29
[PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path · SZEDER Gábor <hidden> · 2026-05-20
Re: [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path · Taylor Blau <hidden> · 2026-05-20
Re: [PATCH 4/8] pack-bitmap: consolidate `find_object_pos()` success path · Jeff King <hidden> · 2026-05-27
[PATCH 5/8] pack-bitmap: cache object positions during fill · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 5/8] pack-bitmap: cache object positions during fill · Jeff King <hidden> · 2026-05-27
Re: [PATCH 5/8] pack-bitmap: cache object positions during fill · Taylor Blau <hidden> · 2026-05-27
[PATCH 6/8] pack-bitmap: sort bitmaps before XORing · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 6/8] pack-bitmap: sort bitmaps before XORing · Jeff King <hidden> · 2026-05-27
Re: [PATCH 6/8] pack-bitmap: sort bitmaps before XORing · Taylor Blau <hidden> · 2026-05-27
Re: [PATCH 6/8] pack-bitmap: sort bitmaps before XORing · Jeff King <hidden> · 2026-05-29
[PATCH 7/8] pack-bitmap: remember pseudo-merge parents · Taylor Blau <hidden> · 2026-05-19
[PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps · Taylor Blau <hidden> · 2026-05-19
Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps · Jeff King <hidden> · 2026-05-27
Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps · Taylor Blau <hidden> · 2026-05-27
Re: [PATCH 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps · Jeff King <hidden> · 2026-05-29
Re: [PATCH 0/8] pack-bitmap-write: speed up bitmap generation · Jeff King <hidden> · 2026-05-27
[PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 1/8] pack-bitmap: pass object position to `fill_bitmap_tree()` · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 2/8] pack-bitmap: check subtree bits before recursing · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 3/8] pack-bitmap: reuse stored selected bitmaps · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 4/8] pack-bitmap: consolidate `find_object_pos()` success path · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 5/8] pack-bitmap: cache object positions during fill · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 6/8] pack-bitmap: sort bitmaps before XORing · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 7/8] pack-bitmap: remember pseudo-merge parents · Taylor Blau <hidden> · 2026-05-27
[PATCH v2 8/8] pack-bitmap: build pseudo-merge bitmaps after regular bitmaps · Taylor Blau <hidden> · 2026-05-27
Re: [PATCH v2 0/8] pack-bitmap-write: speed up bitmap generation · Jeff King <hidden> · 2026-05-29

From: Taylor Blau <hidden>
Date: 2026-05-27 14:46:12

On Wed, May 27, 2026 at 05:45:35AM -0400, Jeff King wrote:

quoted

Combat this by adding a small, direct-mapped cache to the bitmap writer
which maps object IDs to their corresponding bit positions. Size the
cache according to the number of objects being written, with fixed lower
and upper bounds so small repositories do not pay for a large table and
large repositories can avoid most repeated packlist and MIDX lookups.

Introducing another layer of data structure feels so dirty, but it's
hard to argue with the numbers. We are looking up oids in the packlist,
so it's already O(lg n). Your cache here is essentially a hash lookup,
which is O(1)-ish (with collisions causing eviction rather than growth).
And it presumably works because there's a lot of locality in lookups
(between commits X and X^1, their top-level trees will be almost
identical but we have to resolve the bits to find out which entries are
new).

It does make me wonder if we'd see similar improvements if we just
turned the packlist into a regular hash table. Or maybe not, because
then we'd have to do actual probing.

I haven't run that experiment directly, but I share your suspicion. I
wrote a 2- and 4-way associative cache implementation as alternatives
before settling on the direct-mapped approach in this patch. I found
that associative caches regardless of cache lines nearly nuked any of
the performance gains that we got as a result of this patch.

So this really is a somewhat unique situation. It _might_ be applicable
for the reading side of bitmaps, though. When we do fill-in traversal we
end up with this same "read a tree, find the bit for each entry, and 99%
of the time find that it is already in the bitmap".

I think it's certainly likely. In my experience, many object to
bit-position queries are extremely cache-friendly. And in practice, many
large repositories have MIDXs with many tens of millions of objects. So
even on a O(log n) lookup, caching seems to help a lot.

quoted

In our example repository from above and in earlier commits, this
results in a ~9.4% reduction in runtime relative to the previous commit:

    +------------------+-------------+-------------+---------------------+
    |                  | HEAD^       | HEAD        | Delta               |
    +------------------+-------------+-------------+---------------------+
    | elapsed          |   324.8 s   |   294.1 s   |    -30.7 s  (-9.4%) |
    | cycles           | 1,508.6 B   | 1,365.5 B   |   -143.0 B  (-9.5%) |
    | instructions     | 1,436.6 B   | 1,389.8 B   |    -46.9 B  (-3.3%) |
    | CPI              |     1.050   |     0.983   |   -0.068    (-6.4%) |
    +------------------+-------------+-------------+---------------------+

I show a 26% speed up on linux.git (1m37 down to 1m12). Very cool.

Glad it reproduces ;-).

quoted

+static uint32_t store_cached_object_pos(struct bitmap_writer *writer,
+					const struct object_id *oid,
+					uint32_t pos)
+{
+	size_t slot;
+
+	if (pos & BITMAP_POS_CACHE_VALID)
+		return pos; /* too large to cache */

Cute, I wondered what would happen if we went past 2^31. I suspect there
are other parts of the code that do not behave that well around that
size, but it is good that we are not introducing any new surprises.

Yeah, I suspect that there are many breakages past the 32-bit unsigned
maximum number of objects. I figure the easiest thing to do in this
patch is to avoid making that situation worse by simply not caching
objects that are near that limit.

Thanks,
Taylor

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help