Re: [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-05-10 19:17:13
On Thu, May 10 2018, Derrick Stolee wrote:
The behavior in this patch series does the following: 1. Near the end of 'git gc', run 'git commit-graph write'. The location of this code assumes that a 'git gc --auto' has not terminated early due to not meeting the auto threshold. 2. At the end of 'git fetch', run 'git commit-graph write'. This means that every reachable commit will be in the commit-graph after a a successful fetch, which seems a reasonable frequency. Then, the only times we would be missing a reachable commit is after creating one locally. There is a problem with the current patch, though: every 'git fetch' call runs 'git commit-graph write', even if there were no ref updates or objects downloaded. Is there a simple way to detect if the fetch was non-trivial? One obvious problem with this approach: if we compute this during 'gc' AND 'fetch', there will be times where a 'fetch' calls 'gc' and triggers two commit-graph writes. If I were to abandon one of these patches, it would be the 'fetch' integration. A 'git gc' really wants to delete all references to unreachable commits, and without updating the commit-graph we may still have commit data in the commit-graph file that is not in the object database. In fact, deleting commits from the object database but not from the commit-graph will cause 'git commit-graph verify' to fail! I welcome discussion on these ideas, as we are venturing out of the "pure data structure" world and into the "user experience" world. I am less confident in my skills in this world, but the feature is worthless if it does not improve the user experience.
I really like #1 here, but I wonder why #2 is necessary. I.e. is it critical for the performance of the commit graph feature that it be kept really up-to-date, moreso than other things that rely on gc --auto (e.g. the optional bitmap index)? Even if that's the case, I think something that does this via gc --auto is a much better option. I.e. now we have gc.auto & gc.autoPackLimit, if the answer to my question above is "yes" this could also be accomplished by introducing a new graph-specific gc.* setting, and --auto would just update the graph more often, but leave the rest.