Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph
From: Jeff King <hidden>
Date: 2018-10-05 20:09:52
On Fri, Oct 05, 2018 at 10:01:31PM +0200, Ævar Arnfjörð Bjarmason wrote:
quoted
There's unfortunately not a fast way of doing that. One option would be to keep a counter of "ungraphed commit objects", and have callers update it. Anybody admitting a pack via index-pack or unpack-objects can easily get this information. Commands like fast-import can do likewise, and "git commit" obviously increments it by one. I'm not excited about adding a new global on-disk data structure (and the accompanying lock).You don't really need a new global datastructure to solve this problem. It would be sufficient to have git-gc itself write out a 4-line text file after it runs saying how many tags, commits, trees and blobs it found on its last run. You can then fuzzily compare object counts v.s. commit counts for the purposes of deciding whether something like the commit-graph needs to be updated, while assuming that whatever new data you have has similar enough ratios of those as your existing data.
I think this is basically the same thing as Stolee's suggestion to keep the total object count in the commit-graph file. The only difference is here is that we know the actual ratio of commit to blobs for this particular repository. But I don't think we need to know that. As you said, this is fuzzy anyway, so a single number for "update the graph when there are N new objects" is likely enough. If you had a repository with an unusually large tree, you'd end up rebuilding the graph more often. But I think it would probably be OK, as we're primarily trying not to waste time doing a graph rebuild when we've only done a small amount of other work. But if we just shoved a ton of objects through index-pack then we did a lot of work, whether those were commit objects or not. -Peff