Re: [PATCH 1/2] builtin/repack.c: simplify cruft pack aggregation
From: Patrick Steinhardt <hidden>
Date: 2025-02-28 07:52:08
On Thu, Feb 27, 2025 at 01:29:28PM -0500, Taylor Blau wrote:
In 37dc6d8104 (builtin/repack.c: implement support for
`--max-cruft-size`, 2023-10-02), 'git repack' built on support for
multiple cruft packs in Git by instructing 'git pack-objects --cruft'
how to aggregate smaller cruft packs up to the provided threshold.
The implementation in 37dc6d8104 worked something like the following
pseudo-code:
total_size = 0;
for (p in cruft packs) {
if (p->pack_size + total_size < max_size) {
total_size += p->pack_size;
collapse(p)
} else {
retain(p);
}
}
The original idea behind this approach was that smaller cruft packs
would get combined together until the sum of their sizes was no larger
than the given max pack size.
There is a much simpler way to achieve this, however, which is to simply
combine *all* cruft packs which are smaller than the threshold,
regardless of what their sum is. With '--max-pack-size', 'pack-objects'
will split out the resulting pack into individual pack(s) if necessary
to ensure that the written pack(s) are each no larger than the provided
threshold.Hm. So the result would be a new set of packfiles where each of them is smaller than the threshold, right? Wouldn't that mean that the next time we'll again do the same thing and try to combine the new set of cruft packs into one, and basically never arrive at a state where we don't touch the cruft packs anymore? Patrick