Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index
From: Taylor Blau <hidden>
Date: 2025-12-10 00:22:13
On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
The function `midx_preferred_pack()` returns the preferred pack for a
given multi-pack index. To compute the preferred pack we:
1. Look up the position of the first object indexed by the multi-pack
index.
2. Convert this position from pseudo-pack order into MIDX order.
3. We then look up pack that corresponds to this MIDX index.
I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.
midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.
Effectively we're doing something like:
uint32_t pseudo_pack_pos = m->num_objects_in_base;
uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);
This reliably returns the preferred pack given that all of its contained
objects will be up front in pseudo-pack order.
The second step that turns the pseudo-pack order into MIDX order
requires the reverse index though, which may not exist for example when
the MIDX does not have a bitmap. And in that case one may easily hit a
bug:
BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loadedThat makes sense, we can't convert a pseudo-pack position into a MIDX-relative position without a reverse index to tell us where that bit goes.
In theory, `midx_preferred_pack()` already knows to handle the case where no reverse index exists, as it calls `load_midx_revindex()` before calling into `midx_preferred_pack()`. [...]
Right, it handles this case by returning -1 to indicate that it didn't have enough information to determine the preferred pack.
[...] But we only check for negative return values there, even though the function returns a positive error code in case the reverse index does not exist.
Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.
While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:
$ git grep -p load_revindex_from_disk
pack-revindex.c=struct revindex_header {
pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
pack-revindex.c: ret = load_revindex_from_disk(p->repo->hash_algo,
pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
pack-revindex.c: ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,
, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.
quoted hunk ↗ jump to hunk
diff --git a/midx.c b/midx.c index 24e1e72175..b681b18fc1 100644 --- a/midx.c +++ b/midx.c@@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id) { if (m->preferred_pack_idx == -1) { uint32_t midx_pos; - if (load_midx_revindex(m) < 0) { + if (load_midx_revindex(m)) { m->preferred_pack_idx = -2; return -1; }diff --git a/pack-revindex.h b/pack-revindex.h index 422c2487ae..0042892091 100644 --- a/pack-revindex.h +++ b/pack-revindex.h@@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p); * multi-pack index by mmap-ing it and assigning pointers in the * multi_pack_index to point at it. * - * A negative number is returned on error. + * A negative number is returned on error. A positive number is returned in + * case the multi-pack-index does not have a reverse index.
Makes sense.
quoted hunk ↗ jump to hunk
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh index 93f319a4b2..9492a9737b 100755 --- a/t/t5319-multi-pack-index.sh +++ b/t/t5319-multi-pack-index.sh@@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' ' # the new MIDX git multi-pack-index write --preferred-pack=pack-$pack.pack ) +' +test_expect_success 'preferred pack cannot be determined without bitmap' ' + test_when_finished "rm -fr preferred-can-be-queried" && + git init preferred-can-be-queried && + ( + cd preferred-can-be-queried && + test_commit initial && + git repack -Adl --write-midx --no-write-bitmap-index && + test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err && + test_grep "could not determine MIDX preferred pack" err &&
Looks good. I think that it's fine to end the test here, since we have extensive coverage that we can determine the preferred pack when there is a MIDX and matching bitmap, but I definitely don't feel strongly about it. Thanks, Taylor