Thread (15 messages) 15 messages, 6 authors, 2022-03-17

Re: tb/cruft-packs (was Re: What's cooking in git.git (Mar 2022, #01; Thu, 3))

From: Derrick Stolee <hidden>
Date: 2022-03-07 20:51:47

On 3/7/2022 3:18 PM, Jonathan Nieder wrote:
Derrick Stolee wrote:
quoted
On 3/7/2022 1:18 PM, Taylor Blau wrote:
quoted
On Mon, Mar 07, 2022 at 10:06:00AM -0800, Jonathan Nieder wrote:
quoted
quoted
quoted
 2. Marking this as a repository format extension so it doesn't interact
    poorly with Git implementations (including older versions of Git
    itself) that are not aware of the new feature
The design of cruft packs was done intentionally to avoid needing a
format extension. The cruft pack is "just a pack" to any older version
of Git. The only thing an older version of Git wouldn't understand is
how to interpret the .mtimes file. But that's no different than the
current behavior without cruft packs, where any unreachable object
inherits the mtime of its containing pack.

So an older version of Git might prune a different set of objects than a
version that understands cruft packs depending on the contents of the
.mtimes file, the mtime of the cruft pack, and the width of the grace
period. But I think by downgrading you are more or less buying into the
existing behavior. So I don't think there is a compelling reason to
introduce a format extension here.
In particular, older versions would first explode unreachable objects
out of the cruft pack and into loose objects before expiring any of
them based on the loose object mtime. There is no risk here of causing
problems with older versions of Git and does not need an extension.
Surely when older and versions are acting on the same repository, they
would fight by exploding out unreachable objects, packing them back
into a cruft pack, etc, no?
You are referring to a situation where there are multiple possible
versions responsible for maintaining a repository. Git does not
support parallel writers doing significant updates like full
repacks and GCs and instead relies on the user to control the
concurrency there. The standard we keep to is that parallel readers
can still access the repo during this time.

If someone was running a case where they had these parallel
maintenance processes, then they would already be risking failure
with existing features (though actually in the case of the old
versions breaking the new ones): what if the new/old versions
differ in their understanding of the commit-graph? The old one
could remove commits but not update the commit-graph, leaving
extra commits in that file that the new one would fail to verify.
How about the multi-pack-index? The new version would try loading
objects from missing pack-files since the old version deleted
those packs without updating the multi-pack-index.

At least in the cruft packs the worst case is that no objects are
ever expired because they are toggling between loose objects and
cruft packs.

Thanks,
-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help