Thread (12 messages) 12 messages, 2 authors, 2025-12-18

Re: [PATCH 1/2] midx: fix `BUG()` when getting preferred pack without a reverse index

From: Taylor Blau <hidden>
Date: 2025-12-10 00:22:13

On Mon, Dec 08, 2025 at 07:27:14PM +0100, Patrick Steinhardt wrote:
The function `midx_preferred_pack()` returns the preferred pack for a
given multi-pack index. To compute the preferred pack we:

  1. Look up the position of the first object indexed by the multi-pack
     index.

  2. Convert this position from pseudo-pack order into MIDX order.

  3. We then look up pack that corresponds to this MIDX index.
I think the implementation of midx_preferred_pack() works a little bit
differently than is described here. I often get confused when working in
this area juggling between the various object/pack orderings in my head.

midx_preferred_pack() cares about converting from the first position in
pseudo-pack order back into MIDX object order. To do that, we convert
the pseudo-pack position into a MIDX one, and then lookup the pack that
represents that object.

Effectively we're doing something like:

    uint32_t pseudo_pack_pos = m->num_objects_in_base;
    uint32_t midx_pos = pack_pos_to_midx(m, pseudo_pack_pos);
    uint32_t pack_int_id = nth_midxed_pack_int_id(m, midx_pos);
This reliably returns the preferred pack given that all of its contained
objects will be up front in pseudo-pack order.

The second step that turns the pseudo-pack order into MIDX order
requires the reverse index though, which may not exist for example when
the MIDX does not have a bitmap. And in that case one may easily hit a
bug:

    BUG: ../pack-revindex.c:491: pack_pos_to_midx: reverse index not yet loaded
That makes sense, we can't convert a pseudo-pack position into a
MIDX-relative position without a reverse index to tell us where that bit
goes.
In theory, `midx_preferred_pack()` already knows to handle the case
where no reverse index exists, as it calls `load_midx_revindex()` before
calling into `midx_preferred_pack()`. [...]
Right, it handles this case by returning -1 to indicate that it didn't
have enough information to determine the preferred pack.
[...] But we only check for negative
return values there, even though the function returns a positive error
code in case the reverse index does not exist.
Ah. It looks like that was changed in 5a6072f631d (fsck: validate .rev
file header, 2023-04-17), but it looks like the caller here did not
learn about that change. It may be worth mentioning that commit in your
patch message.

While reviewing, I wanted to make sure that there weren't any other
callers of load_midx_revindex() that were also missing this check. The
return value of that function is propagated through the two expected
functions:

    $ git grep -p load_revindex_from_disk
    pack-revindex.c=struct revindex_header {
    pack-revindex.c:static int load_revindex_from_disk(const struct git_hash_algo *algo,
    pack-revindex.c=int load_pack_revindex_from_disk(struct packed_git *p)
    pack-revindex.c:        ret = load_revindex_from_disk(p->repo->hash_algo,
    pack-revindex.c=int load_midx_revindex(struct multi_pack_index *m)
    pack-revindex.c:        ret = load_revindex_from_disk(m->source->odb->repo->hash_algo,

, and checking through the callers of those two functions, all are
prepared to handle a >0 return value.
quoted hunk ↗ jump to hunk
diff --git a/midx.c b/midx.c
index 24e1e72175..b681b18fc1 100644
--- a/midx.c
+++ b/midx.c
@@ -686,7 +686,7 @@ int midx_preferred_pack(struct multi_pack_index *m, uint32_t *pack_int_id)
 {
 	if (m->preferred_pack_idx == -1) {
 		uint32_t midx_pos;
-		if (load_midx_revindex(m) < 0) {
+		if (load_midx_revindex(m)) {
 			m->preferred_pack_idx = -2;
 			return -1;
 		}
diff --git a/pack-revindex.h b/pack-revindex.h
index 422c2487ae..0042892091 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -72,7 +72,8 @@ int verify_pack_revindex(struct packed_git *p);
  * multi-pack index by mmap-ing it and assigning pointers in the
  * multi_pack_index to point at it.
  *
- * A negative number is returned on error.
+ * A negative number is returned on error. A positive number is returned in
+ * case the multi-pack-index does not have a reverse index.
Makes sense.
quoted hunk ↗ jump to hunk
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 93f319a4b2..9492a9737b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -350,7 +350,20 @@ test_expect_success 'preferred pack from existing MIDX without bitmaps' '
 		# the new MIDX
 		git multi-pack-index write --preferred-pack=pack-$pack.pack
 	)
+'

+test_expect_success 'preferred pack cannot be determined without bitmap' '
+	test_when_finished "rm -fr preferred-can-be-queried" &&
+	git init preferred-can-be-queried &&
+	(
+		cd preferred-can-be-queried &&
+		test_commit initial &&
+		git repack -Adl --write-midx --no-write-bitmap-index &&
+		test_must_fail test-tool read-midx --preferred-pack .git/objects 2>err &&
+		test_grep "could not determine MIDX preferred pack" err &&
Looks good. I think that it's fine to end the test here, since we have
extensive coverage that we can determine the preferred pack when there
is a MIDX and matching bitmap, but I definitely don't feel strongly
about it.

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help