Thread (62 messages) 62 messages, 4 authors, 2025-10-27
STALE239d

[PATCH v3 00/10] builtin/maintenance: introduce "geometric" strategy

From: Patrick Steinhardt <hidden>
Date: 2025-10-24 06:57:21

Hi,

by default, git-maintenance(1) uses git-gc(1) to perform repository
housekeeping. This tool has a couple of shortcomings, most importantly
that it regularly does all-into-one repacks. This doesn't really work
all that well in the context of monorepos, where you really want to
avoid repacking all objects regularly.

An alternative maintenance strategy is the "incremental" strategy, but
this strategy has two downsides:

  - Strategies in general only apply to scheduled maintenance. So if you
    run git-maintenance(1), you still end up with git-gc(1).

  - The strategy is designed to not ever delete any data, but a full
    replacment for git-gc(1) needs to also prune reflogs, rereree caches
    and vanished worktrees.

This patch series aims to fix both of these issues.

First, the series introduces a new "geometric" maintenance task, which
makes use of geometric repacking as exposed by git-repack(1) in the
general case. In the case where a geometric repack ends up merging all
packfiles into one we instead do an all-into-one repack with cruft packs
so that we can still phase out objects over time.

Second, the series extends maintenance strategies to also cover normal
maintenance. If the user has configured the "geometric" strategy, we'll
thus use it for both manual and scheduled maintenance. For backwards
compatibility, the "incremental" strategy is changed so that it uses
git-gc(1) for manual maintenance and the other tasks for scheduled
maintenance.

The series is built on top of b660e2dcb9 (Sync with 'maint', 2025-10-14)
with tb/incremental-midx-part-3.1 at c886af90f8 (SQUASH??? play well
with other topics by preemptively including "repository.h", 2025-09-29)
merged into it.

Changes in v3:
  - More line wrapping.
  - Improve readability of maintenance strategies by using nested
    designated initializers.
  - Use git-count-object(1) to count loose objects.
  - Link to v2: https://lore.kernel.org/r/20251021-pks-maintenance-geometric-strategy-v2-0-f0d727832b80@pks.im (local)

Changes in v2:
  - Make the geometric factor configurable via
    "maintenance.geometric-repack.splitFactor".
  - Wrap some overly long lines in our tests.
  - Link to v1: https://lore.kernel.org/r/20251016-pks-maintenance-geometric-strategy-v1-0-18943d474203@pks.im (local)

Thanks!

Patrick

---
Patrick Steinhardt (10):
      builtin/gc: remove global `repack` variable
      builtin/gc: make `too_many_loose_objects()` reusable without GC config
      builtin/maintenance: introduce "geometric-repack" task
      builtin/maintenance: make the geometric factor configurable
      builtin/maintenance: don't silently ignore invalid strategy
      builtin/maintenance: improve readability of strategies
      builtin/maintenance: run maintenance tasks depending on type
      builtin/maintenance: extend "maintenance.strategy" to manual maintenance
      builtin/maintenance: make "gc" strategy accessible
      builtin/maintenance: introduce "geometric" strategy

 Documentation/config/maintenance.adoc |  49 +++++-
 builtin/gc.c                          | 313 ++++++++++++++++++++++++++++------
 t/t7900-maintenance.sh                | 245 ++++++++++++++++++++++++++
 3 files changed, 544 insertions(+), 63 deletions(-)

Range-diff versus v2:

 1:  b853ba54dca =  1:  c35408a33d0 builtin/gc: remove global `repack` variable
 2:  9bbdfe1b9e5 =  2:  be572fe1542 builtin/gc: make `too_many_loose_objects()` reusable without GC config
 3:  bcd82ad038e !  3:  5290f6d3e0f builtin/maintenance: introduce "geometric-repack" task
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +	test_line_count = "$EXPECTED_PACKS" packfiles &&
     +
     +	# And verify that there are no loose objects anymore.
    -+	cat >expect <<-\EOF &&
    -+	info
    -+	pack
    -+	EOF
    -+	ls .git/objects >actual &&
    -+	test_cmp expect actual
    ++	git count-objects -v >count &&
    ++	test_grep '^count: 0$' count
     +}
     +
     +test_expect_success 'geometric repacking task' '
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		# The initial repack causes an all-into-one repack.
     +		GIT_TRACE2_EVENT="$(pwd)/initial-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <initial-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <initial-repack.txt &&
     +
     +		# Repacking should now cause a no-op geometric repack because
     +		# no packfiles need to be combined.
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		# an all-into-one-repack.
     +		GIT_TRACE2_EVENT="$(pwd)/all-into-one-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <all-into-one-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <all-into-one-repack.txt &&
     +
     +		# The geometric repack soaks up unreachable objects.
     +		echo blob-1 | git hash-object -w --stdin -t blob &&
    @@ t/t7900-maintenance.sh: test_expect_success 'maintenance.incremental-repack.auto
     +		run_and_verify_geometric_pack 3 &&
     +		GIT_TRACE2_EVENT="$(pwd)/cruft-repack.txt" \
     +			git maintenance run --task=geometric-repack 2>/dev/null &&
    -+		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago --quiet --write-midx <cruft-repack.txt &&
    ++		test_subcommand git repack -d -l --cruft --cruft-expiration=2.weeks.ago \
    ++			--quiet --write-midx <cruft-repack.txt &&
     +		ls .git/objects/pack/*.pack >packs &&
     +		test_line_count = 2 packs &&
     +		ls .git/objects/pack/*.mtimes >cruft &&
 4:  cb10031cc7c =  4:  7f2067fa4ec builtin/maintenance: make the geometric factor configurable
 5:  7e8f83d4753 =  5:  7a76003215e builtin/maintenance: don't silently ignore invalid strategy
 -:  ----------- >  6:  a6383d121b2 builtin/maintenance: improve readability of strategies
 6:  4217c37c0bf !  7:  e25c878a3ff builtin/maintenance: run maintenance tasks depending on type
    @@ builtin/gc.c: static int maintenance_run_tasks(struct maintenance_run_opts *opts
      		enum schedule_priority schedule;
      	} tasks[TASK__COUNT];
      };
    - 
    - static const struct maintenance_strategy none_strategy = { 0 };
    -+
    +@@ builtin/gc.c: static const struct maintenance_strategy none_strategy = { 0 };
      static const struct maintenance_strategy default_strategy = {
      	.tasks = {
    --		[TASK_GC].enabled = 1,
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    + 		[TASK_GC] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_MANUAL,
    + 		},
      	},
      };
    -+
    +@@ builtin/gc.c: static const struct maintenance_strategy default_strategy = {
      static const struct maintenance_strategy incremental_strategy = {
      	.tasks = {
    --		[TASK_COMMIT_GRAPH].enabled = 1,
    -+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
    --		[TASK_PREFETCH].enabled = 1,
    -+		[TASK_PREFETCH].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PREFETCH].schedule = SCHEDULE_HOURLY,
    --		[TASK_INCREMENTAL_REPACK].enabled = 1,
    -+		[TASK_INCREMENTAL_REPACK].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_INCREMENTAL_REPACK].schedule = SCHEDULE_DAILY,
    --		[TASK_LOOSE_OBJECTS].enabled = 1,
    -+		[TASK_LOOSE_OBJECTS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
    --		[TASK_PACK_REFS].enabled = 1,
    -+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
    + 		[TASK_COMMIT_GRAPH] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_HOURLY,
    + 		},
    + 		[TASK_PREFETCH] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_HOURLY,
    + 		},
    + 		[TASK_INCREMENTAL_REPACK] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_DAILY,
    + 		},
    + 		[TASK_LOOSE_OBJECTS] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_DAILY,
    + 		},
    + 		[TASK_PACK_REFS] = {
    +-			.enabled = 1,
    ++			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_WEEKLY,
    + 		},
      	},
    - };
     @@ builtin/gc.c: static void initialize_task_config(struct maintenance_run_opts *opts,
      {
      	struct strbuf config_name = STRBUF_INIT;
 7:  422b16a62a2 !  8:  ba147c3bf33 builtin/maintenance: extend "maintenance.strategy" to manual maintenance
    @@ Documentation/config/maintenance.adoc: detach.
     
      ## builtin/gc.c ##
     @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
    - 		[TASK_LOOSE_OBJECTS].schedule = SCHEDULE_DAILY,
    - 		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED,
    - 		[TASK_PACK_REFS].schedule = SCHEDULE_WEEKLY,
    -+
    + 			.type = MAINTENANCE_TYPE_SCHEDULED,
    + 			.schedule = SCHEDULE_WEEKLY,
    + 		},
     +		/*
     +		 * Historically, the "incremental" strategy was only available
     +		 * in the context of scheduled maintenance when set up via
    @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
     +		 * requested. This is the same as the default strategy, which
     +		 * would have been in use beforehand.
     +		 */
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    ++		[TASK_GC] = {
    ++			.type = MAINTENANCE_TYPE_MANUAL,
    ++		},
      	},
      };
      
 8:  07f5b32a22e !  9:  eebfab4acda builtin/maintenance: make "gc" strategy accessible
    @@ builtin/gc.c: struct maintenance_strategy {
     -static const struct maintenance_strategy default_strategy = {
     +static const struct maintenance_strategy gc_strategy = {
      	.tasks = {
    --		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_GC].type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
    -+		[TASK_GC].schedule = SCHEDULE_DAILY,
    + 		[TASK_GC] = {
    +-			.type = MAINTENANCE_TYPE_MANUAL,
    ++			.type = MAINTENANCE_TYPE_MANUAL | MAINTENANCE_TYPE_SCHEDULED,
    ++			.schedule = SCHEDULE_DAILY,
    + 		},
      	},
      };
    - 
     @@ builtin/gc.c: static struct maintenance_strategy parse_maintenance_strategy(const char *name)
      {
      	if (!strcasecmp(name, "incremental"))
 9:  c597ae7f94a ! 10:  936358736f3 builtin/maintenance: introduce "geometric" strategy
    @@ builtin/gc.c: static const struct maintenance_strategy incremental_strategy = {
      
     +static const struct maintenance_strategy geometric_strategy = {
     +	.tasks = {
    -+		[TASK_COMMIT_GRAPH].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_COMMIT_GRAPH].schedule = SCHEDULE_HOURLY,
    -+		[TASK_GEOMETRIC_REPACK].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_GEOMETRIC_REPACK].schedule = SCHEDULE_DAILY,
    -+		[TASK_PACK_REFS].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_PACK_REFS].schedule = SCHEDULE_DAILY,
    -+		[TASK_RERERE_GC].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_RERERE_GC].schedule = SCHEDULE_WEEKLY,
    -+		[TASK_REFLOG_EXPIRE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_REFLOG_EXPIRE].schedule = SCHEDULE_WEEKLY,
    -+		[TASK_WORKTREE_PRUNE].type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    -+		[TASK_WORKTREE_PRUNE].schedule = SCHEDULE_WEEKLY,
    ++		[TASK_COMMIT_GRAPH] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_HOURLY,
    ++		},
    ++		[TASK_GEOMETRIC_REPACK] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_DAILY,
    ++		},
    ++		[TASK_PACK_REFS] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_DAILY,
    ++		},
    ++		[TASK_RERERE_GC] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
    ++		[TASK_REFLOG_EXPIRE] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
    ++		[TASK_WORKTREE_PRUNE] = {
    ++			.type = MAINTENANCE_TYPE_SCHEDULED | MAINTENANCE_TYPE_MANUAL,
    ++			.schedule = SCHEDULE_WEEKLY,
    ++		},
     +	},
     +};
     +

---
base-commit: 0bb2c786c2349dd6700727153c13d81cbfb41710
change-id: 20251015-pks-maintenance-geometric-strategy-580c58581b01
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help