Thread (38 messages) 38 messages, 3 authors, 1d ago

Re: [PATCH 5/8] pack-bitmap: cache object positions during fill

From: Taylor Blau <hidden>
Date: 2026-05-27 14:46:12

On Wed, May 27, 2026 at 05:45:35AM -0400, Jeff King wrote:
quoted
Combat this by adding a small, direct-mapped cache to the bitmap writer
which maps object IDs to their corresponding bit positions. Size the
cache according to the number of objects being written, with fixed lower
and upper bounds so small repositories do not pay for a large table and
large repositories can avoid most repeated packlist and MIDX lookups.
Introducing another layer of data structure feels so dirty, but it's
hard to argue with the numbers. We are looking up oids in the packlist,
so it's already O(lg n). Your cache here is essentially a hash lookup,
which is O(1)-ish (with collisions causing eviction rather than growth).
And it presumably works because there's a lot of locality in lookups
(between commits X and X^1, their top-level trees will be almost
identical but we have to resolve the bits to find out which entries are
new).

It does make me wonder if we'd see similar improvements if we just
turned the packlist into a regular hash table. Or maybe not, because
then we'd have to do actual probing.
I haven't run that experiment directly, but I share your suspicion. I
wrote a 2- and 4-way associative cache implementation as alternatives
before settling on the direct-mapped approach in this patch. I found
that associative caches regardless of cache lines nearly nuked any of
the performance gains that we got as a result of this patch.
So this really is a somewhat unique situation. It _might_ be applicable
for the reading side of bitmaps, though. When we do fill-in traversal we
end up with this same "read a tree, find the bit for each entry, and 99%
of the time find that it is already in the bitmap".
I think it's certainly likely. In my experience, many object to
bit-position queries are extremely cache-friendly. And in practice, many
large repositories have MIDXs with many tens of millions of objects. So
even on a O(log n) lookup, caching seems to help a lot.
quoted
In our example repository from above and in earlier commits, this
results in a ~9.4% reduction in runtime relative to the previous commit:

    +------------------+-------------+-------------+---------------------+
    |                  | HEAD^       | HEAD        | Delta               |
    +------------------+-------------+-------------+---------------------+
    | elapsed          |   324.8 s   |   294.1 s   |    -30.7 s  (-9.4%) |
    | cycles           | 1,508.6 B   | 1,365.5 B   |   -143.0 B  (-9.5%) |
    | instructions     | 1,436.6 B   | 1,389.8 B   |    -46.9 B  (-3.3%) |
    | CPI              |     1.050   |     0.983   |   -0.068    (-6.4%) |
    +------------------+-------------+-------------+---------------------+
I show a 26% speed up on linux.git (1m37 down to 1m12). Very cool.
Glad it reproduces ;-).
quoted
+static uint32_t store_cached_object_pos(struct bitmap_writer *writer,
+					const struct object_id *oid,
+					uint32_t pos)
+{
+	size_t slot;
+
+	if (pos & BITMAP_POS_CACHE_VALID)
+		return pos; /* too large to cache */
Cute, I wondered what would happen if we went past 2^31. I suspect there
are other parts of the code that do not behave that well around that
size, but it is good that we are not introducing any new surprises.
Yeah, I suspect that there are many breakages past the 32-bit unsigned
maximum number of objects. I figure the easiest thing to do in this
patch is to avoid making that situation worse by simply not caching
objects that are near that limit.

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help