Re: [PATCH 5/8] pack-bitmap: cache object positions during fill
From: Taylor Blau <hidden>
Date: 2026-05-27 14:46:12
On Wed, May 27, 2026 at 05:45:35AM -0400, Jeff King wrote:
quoted
Combat this by adding a small, direct-mapped cache to the bitmap writer which maps object IDs to their corresponding bit positions. Size the cache according to the number of objects being written, with fixed lower and upper bounds so small repositories do not pay for a large table and large repositories can avoid most repeated packlist and MIDX lookups.Introducing another layer of data structure feels so dirty, but it's hard to argue with the numbers. We are looking up oids in the packlist, so it's already O(lg n). Your cache here is essentially a hash lookup, which is O(1)-ish (with collisions causing eviction rather than growth). And it presumably works because there's a lot of locality in lookups (between commits X and X^1, their top-level trees will be almost identical but we have to resolve the bits to find out which entries are new). It does make me wonder if we'd see similar improvements if we just turned the packlist into a regular hash table. Or maybe not, because then we'd have to do actual probing.
I haven't run that experiment directly, but I share your suspicion. I wrote a 2- and 4-way associative cache implementation as alternatives before settling on the direct-mapped approach in this patch. I found that associative caches regardless of cache lines nearly nuked any of the performance gains that we got as a result of this patch.
So this really is a somewhat unique situation. It _might_ be applicable for the reading side of bitmaps, though. When we do fill-in traversal we end up with this same "read a tree, find the bit for each entry, and 99% of the time find that it is already in the bitmap".
I think it's certainly likely. In my experience, many object to bit-position queries are extremely cache-friendly. And in practice, many large repositories have MIDXs with many tens of millions of objects. So even on a O(log n) lookup, caching seems to help a lot.
quoted
In our example repository from above and in earlier commits, this results in a ~9.4% reduction in runtime relative to the previous commit: +------------------+-------------+-------------+---------------------+ | | HEAD^ | HEAD | Delta | +------------------+-------------+-------------+---------------------+ | elapsed | 324.8 s | 294.1 s | -30.7 s (-9.4%) | | cycles | 1,508.6 B | 1,365.5 B | -143.0 B (-9.5%) | | instructions | 1,436.6 B | 1,389.8 B | -46.9 B (-3.3%) | | CPI | 1.050 | 0.983 | -0.068 (-6.4%) | +------------------+-------------+-------------+---------------------+I show a 26% speed up on linux.git (1m37 down to 1m12). Very cool.
Glad it reproduces ;-).
quoted
+static uint32_t store_cached_object_pos(struct bitmap_writer *writer, + const struct object_id *oid, + uint32_t pos) +{ + size_t slot; + + if (pos & BITMAP_POS_CACHE_VALID) + return pos; /* too large to cache */Cute, I wondered what would happen if we went past 2^31. I suspect there are other parts of the code that do not behave that well around that size, but it is good that we are not introducing any new surprises.
Yeah, I suspect that there are many breakages past the 32-bit unsigned maximum number of objects. I figure the easiest thing to do in this patch is to avoid making that situation worse by simply not caching objects that are near that limit. Thanks, Taylor