Thread (62 messages) 62 messages, 4 authors, 2025-10-27

Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task

From: Patrick Steinhardt <hidden>
Date: 2025-10-17 06:13:26

On Thu, Oct 16, 2025 at 03:51:17PM -0500, Justin Tobler wrote:
On 25/10/16 09:26AM, Patrick Steinhardt wrote:
quoted
Introduce a new "geometric-repack" task. This task uses our geometric
repack infrastructure as provided by git-repack(1) itself, which is a
strategy that especially hosting providers tend to use to amortize the
costs of repacking objects.

There is one issue though with geometric repacks, namely that they
unconditionally pack all loose objects, regardless of whether or not
they are reachable. This is done because it means that we can completely
skip the reachability step, which significantly speeds up the operation.
But it has the big downside that we are unable to expire objects over
time.

To address this issue we thus use a split strategy in this new task:
whenever a geometric repack would merge together all packs, we instead
do an all-into-one repack. By default, these all-into-one repacks have
cruft packs enabled, so unreachable objects would now be written into
their own pack. Consequently, they won't be soaked up during geometric
repacking anymore and can be expired with the next full repack, assuming
that their expiry date has surpassed.
So normal geometric repacks don't ever check for unreachable objects,
even if all the packs are being merged together. With this new strategy
though, when a geometric repack would normally merge together all packs,
we instead to an all-into-one repack which does check for unreachable
objects.

Does checking for unreachable objects in this case slow down the repack
significantly?
It'll certainly add some overhead, but I didn't quantify it. My gut
feeling is that the all-into-one repack is going to be slow by nature
anyway, as we have to rewrite all objects. Doing the reachability check
on top is of course going to slow it down even further, but the relative
impact is going to be smaller.

In any case, we have to perform a reachability check at one point in
time, otherwise we won't ever be able to prune unreachable objects. I
guess doing this at the point where we merge all packs into one is a
reasonable tradeoff.

I think the more interesting question is whether we should maybe do this
all-into-one repack more often, so that we can prune more regularly.
With the proposed strategy you'd need to add a significant portion of
new objects before we'd ever prune them, because otherwise we won't do
the all-into-one repack.

I think for an initial version this is going to be fine, but we might
want to iterarate on this eventually and add a time-based component to
the heuristics.

Patrick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help