Re: [PATCH 3/8] builtin/maintenance: introduce "geometric-repack" task
From: Patrick Steinhardt <hidden>
Date: 2025-10-17 06:13:26
On Thu, Oct 16, 2025 at 03:51:17PM -0500, Justin Tobler wrote:
On 25/10/16 09:26AM, Patrick Steinhardt wrote:quoted
Introduce a new "geometric-repack" task. This task uses our geometric repack infrastructure as provided by git-repack(1) itself, which is a strategy that especially hosting providers tend to use to amortize the costs of repacking objects. There is one issue though with geometric repacks, namely that they unconditionally pack all loose objects, regardless of whether or not they are reachable. This is done because it means that we can completely skip the reachability step, which significantly speeds up the operation. But it has the big downside that we are unable to expire objects over time. To address this issue we thus use a split strategy in this new task: whenever a geometric repack would merge together all packs, we instead do an all-into-one repack. By default, these all-into-one repacks have cruft packs enabled, so unreachable objects would now be written into their own pack. Consequently, they won't be soaked up during geometric repacking anymore and can be expired with the next full repack, assuming that their expiry date has surpassed.So normal geometric repacks don't ever check for unreachable objects, even if all the packs are being merged together. With this new strategy though, when a geometric repack would normally merge together all packs, we instead to an all-into-one repack which does check for unreachable objects. Does checking for unreachable objects in this case slow down the repack significantly?
It'll certainly add some overhead, but I didn't quantify it. My gut feeling is that the all-into-one repack is going to be slow by nature anyway, as we have to rewrite all objects. Doing the reachability check on top is of course going to slow it down even further, but the relative impact is going to be smaller. In any case, we have to perform a reachability check at one point in time, otherwise we won't ever be able to prune unreachable objects. I guess doing this at the point where we merge all packs into one is a reasonable tradeoff. I think the more interesting question is whether we should maybe do this all-into-one repack more often, so that we can prune more regularly. With the proposed strategy you'd need to add a significant portion of new objects before we'd ever prune them, because otherwise we won't do the all-into-one repack. I think for an initial version this is going to be fine, but we might want to iterarate on this eventually and add a time-based component to the heuristics. Patrick