Thread (33 messages) 33 messages, 3 authors, 12d ago

Re: [PATCH 0/3] pack-objects: support bitmaps and delta-islands with `--path-walk`

From: Derrick Stolee <hidden>
Date: 2026-05-29 21:28:34

On 5/29/2026 4:07 PM, Taylor Blau wrote:
On Fri, May 29, 2026 at 01:26:33PM -0400, Derrick Stolee wrote:
quoted
On 5/28/26 11:28 AM, Derrick Stolee wrote:
quoted
On 5/27/26 7:18 PM, Taylor Blau wrote:
quoted
Do you have any end-to-end performance data to demonstrate that these
changes are effective at scale? Are we still producing packfiles with the
pack-file compression and now with .bitmap files? How does this impact
the performance of a clone or fetch when using a bitmap index at read
time?
Here's my attempt to use our existing performance tests to analyze the
impact of this series.

Running p5311 against the base of this topic and this topic with
GIT_TEST_PACK_PATH_WALK=1, I get this output:
Yikes. That's not great, but see below for what I think is going on.

(As an aside, we can focus in on either lookup=true or lookup=false,
since these are just controlling whether or not the bitmap lookup table
is written. On a repository as small as git.git, this shouldn't make a
huge difference either way. I have a separate series to make this the
default and to clean up the t/perf suite accordingly, but haven't sent
it to the list yet.)
quoted
Test                                     HEAD~3    HEAD
-----------------------------------------------------------------
[...]
I think this is either the primary reason why you're not seeing an
improvement here, or at least related to it...
quoted
Do you have a good feeling for why the path-walk feature doesn't
make a huge change in these test scenarios?
I think the problem is that we're relying on the TEST_ variable to tell
pack-objects to generate a pack using --path-walk, but treat it as a
fallback.

I suspect that since p5311 invokes repack *without* the '--path-walk'
option, we end up in this case within cmd_pack_objects():

	if (path_walk < 0) {
		if (use_bitmap_index > 0 ||
		    !use_internal_rev_list)
			path_walk = 0;
		else if (the_repository->gitdir &&
			 the_repository->settings.pack_use_path_walk)
			path_walk = 1;
		else
			path_walk = git_env_bool("GIT_TEST_PACK_PATH_WALK", 0);
	}

, where `path_walk` is *not* set, but `use_bitmap_index` is since p5311
set the 'pack.writeBitmaps' configuration (as a separate aside, this
should really prefer the non-deprecated 'repack.writeBitmaps' variant).
I see. When `--use-bitmap-index` is specified, then the test variable is
ignored. So my tests aren't actually measuring the intended end state.
quoted hunk ↗ jump to hunk
So in that case, we fall back to `path_walk = 0`, and don't even bother
reading the `GIT_TEST_PACK_PATH_WALK` variable.

If I modify the perf test like so:
--- 8< ---
diff --git a/t/perf/p5311-pack-bitmaps-fetch.sh b/t/perf/p5311-pack-bitmaps-fetch.sh
index 047efb995d6..1c9c99216e3 100755
--- a/t/perf/p5311-pack-bitmaps-fetch.sh
+++ b/t/perf/p5311-pack-bitmaps-fetch.sh
@@ -13,7 +13,7 @@ test_fetch_bitmaps () {
 	test_expect_success 'create bitmapped server repo' '
 		git config pack.writebitmaps true &&
 		git config pack.writeBitmapLookupTable '"$1"' &&
-		git repack -ad
+		git repack -ad --path-walk
 	'

 	# simulate a fetch from a repository that last fetched N days ago, for
--- >8 ---
, then I can get significantly improved results when running without the
GIT_TEST_PACK_PATH_WALK variablle (here I'm truncating the
'lookup=false' case, which performs nearly identically):

    Test                                       HEAD~3            HEAD
    ------------------------------------------------------------------------------------
    5311.4: server   (1 days) (lookup=true)    2.57(2.52+0.04)   0.03(0.02+0.00) -98.8%
    5311.5: size     (1 days)                           153.4K            153.4K +0.0%
    5311.6: client   (1 days) (lookup=true)    0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.8: server   (2 days) (lookup=true)    2.60(2.55+0.04)   0.02(0.02+0.00) -99.2%
    5311.9: size     (2 days)                           153.4K            153.4K +0.0%
    5311.10: client   (2 days) (lookup=true)   0.00(0.01+0.00)   0.00(0.01+0.00) =
    5311.12: server   (4 days) (lookup=true)   2.60(2.54+0.05)   0.03(0.03+0.00) -98.8%
    5311.13: size     (4 days)                          209.0K            209.0K +0.0%
    5311.14: client   (4 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.16: server   (8 days) (lookup=true)   2.58(2.53+0.04)   0.03(0.03+0.00) -98.8%
    5311.17: size     (8 days)                          209.0K            209.0K +0.0%
    5311.18: client   (8 days) (lookup=true)   0.01(0.01+0.00)   0.01(0.02+0.00) +0.0%
    5311.20: server  (16 days) (lookup=true)   2.58(2.52+0.05)   0.03(0.03+0.00) -98.8%
    5311.21: size    (16 days)                          209.0K            209.0K +0.0%
    5311.22: client  (16 days) (lookup=true)   0.01(0.02+0.00)   0.01(0.01+0.00) +0.0%
    5311.24: server  (32 days) (lookup=true)   2.61(2.58+0.03)   0.03(0.02+0.01) -98.9%
    5311.25: size    (32 days)                          212.9K            212.9K +0.0%
    5311.26: client  (32 days) (lookup=true)   0.01(0.02+0.00)   0.02(0.02+0.00) +100.0%
    5311.28: server  (64 days) (lookup=true)   2.72(2.79+0.06)   0.19(0.30+0.03) -93.0%
    5311.29: size    (64 days)                            4.5M              4.5M -0.0%
    5311.30: client  (64 days) (lookup=true)   0.49(0.58+0.02)   0.48(0.56+0.04) -2.0%
    5311.32: server (128 days) (lookup=true)   2.90(3.21+0.09)   0.35(0.70+0.04) -87.9%
    5311.33: size   (128 days)                            9.4M              9.5M +0.4%
    5311.34: client (128 days) (lookup=true)   0.98(1.27+0.08)   0.98(1.33+0.06) +0.0%

My reading here is that we get significantly smaller packs (i.e. all
'test_size' tests drop from HEAD~3 to HEAD) in the same amount of time
(i.e. that all 'test_perf' tests are roughly flat).
The sizes don't shrink, and in one case increases by a small amount. I'm
happy to count those cases as noise from multi-threaded delta calculations
being less deterministic.

The _time_ taken to compute the packfiles is what decreases, though, which
is promising.
That lines up with my expectation here, which is that even though we're
using bitmaps at read time, that's effectively seeding the verbatim pack
reuse over a significantly smaller pack, producing a much smaller output
pack as a result.
Can you double-check this reasoning with my read of the data? The size
isn't changing, but the computation time is.
quoted hunk ↗ jump to hunk
As to whether we should modify the perf suite to test this, naturally I
think we should. Likely that looks like modifying p5313 to re-run
`test_all_with_args` with '--use-bitmap-index' after repacking with
--path-walk and generating bitmaps like so (untested):
--- 8< ---
diff --git a/t/perf/p5313-pack-objects.sh b/t/perf/p5313-pack-objects.sh
index 46a6cd32d24..663717982b1 100755
--- a/t/perf/p5313-pack-objects.sh
+++ b/t/perf/p5313-pack-objects.sh
@@ -22,6 +22,21 @@ test_expect_success 'create rev input' '
 	EOF
 '

+test_repack_with_args () {
+	args="$@"
+	export args
+
+	test_perf "repack with $args" '
+		git repack -adf $args
+	'
+
+	test_size "repack size with $args" '
+		gitdir=$(git rev-parse --git-dir) &&
+		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
+		test_file_size "$pack"
+	'
+}
+
I see that these tests are extracted from test_all_... below:
quoted hunk ↗ jump to hunk
 test_all_with_args () {
 	parameter=$1
 	export parameter
@@ -52,23 +67,22 @@ test_all_with_args () {
 	test_size "shallow pack size with $parameter" '
 		test_file_size out
 	'
-
-	test_perf "repack with $parameter" '
-		git repack -adf $parameter
-	'
-
-	test_size "repack size with $parameter" '
-		gitdir=$(git rev-parse --git-dir) &&
-		pack=$(ls $gitdir/objects/pack/pack-*.pack) &&
-		test_file_size "$pack"
-	'
 }
Because the --use-bitmap-index and --write-bitmap-index args
aren't appropriate across these different commands.

nit: the diff would be more obvious if test_repack_with_args
was defined after test_all_with_args so the hunk of existing
tests wouldn't appear in the diff.
 for version in 1 2
 do
-	test_all_with_args --name-hash-version=$version
+	arg="--name-hash-version=$version" &&
+
+	test_all_with_args "$arg" &&
+	test_repack_with_args "$arg" || return 1
 done

 test_all_with_args --path-walk
+test_repack_with_args --path-walk
+
+# inverted order here: we want to test using reachability bitmaps on a
+# pack written with --path-walk
+test_repack_with_args --path-walk --write-bitmap-index
+test_all_with_args --use-bitmap-index
So this allows us to test all of the different modes.
quoted hunk ↗ jump to hunk
--- >8 ---
I don't have a strong opinion on whether or not we should include that
in this series or elsewhere.
I'm interested to see some results of your new p5313 test
to make sure that we are getting expected size changes for
the repack, since the p5311 tests were more focused on the
thin fetch pack (and didn't show a change in size).

For that, I'd be interested to see this test be included in
a patch for future reference, too.

Thanks,
-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help