Re: [PATCH v4 10/10] commit-graph.txt: update design document
From: Derrick Stolee <hidden>
Date: 2018-05-01 12:00:57
On 4/30/2018 7:32 PM, Jakub Narebski wrote:
Derrick Stolee [off-list ref] writes:quoted
We now calculate generation numbers in the commit-graph file and use them in paint_down_to_common(). Expand the section on generation numbers to discuss how the three special generation numbers GENERATION_NUMBER_INFINITY, _ZERO, and _MAX interact with other generation numbers. Signed-off-by: Derrick Stolee <redacted>Looks good.quoted
--- Documentation/technical/commit-graph.txt | 30 +++++++++++++++++++----- 1 file changed, 24 insertions(+), 6 deletions(-)diff --git a/Documentation/technical/commit-graph.txt b/Documentation/technical/commit-graph.txt index 0550c6d0dc..d9f2713efa 100644 --- a/Documentation/technical/commit-graph.txt +++ b/Documentation/technical/commit-graph.txt@@ -77,6 +77,29 @@ in the commit graph. We can treat these commits as having "infinite" generation number and walk until reaching commits with known generation number. +We use the macro GENERATION_NUMBER_INFINITY = 0xFFFFFFFF to mark commits not +in the commit-graph file. If a commit-graph file was written by a version +of Git that did not compute generation numbers, then those commits will +have generation number represented by the macro GENERATION_NUMBER_ZERO = 0. + +Since the commit-graph file is closed under reachability, we can guarantee +the following weaker condition on all commits: + + If A and B are commits with generation numbers N amd M, respectively, + and N < M, then A cannot reach B. + +Note how the strict inequality differs from the inequality when we have +fully-computed generation numbers. Using strict inequality may result in +walking a few extra commits,The linux kernel commit graph has maximum of 513 commits sharing the same generation number, but is is 5.43 commits sharing the same generation number on average, with standard deviation 10.70; median is even lower: it is 2, with 5.35 median absolute deviation (MAD). So on average it would be a few extra commits. Right.quoted
but the simplicity in dealing with commits +with generation number *_INFINITY or *_ZERO is valuable.As I wrote before, handling those corner cases in more complicated, but not that complicated. We could simply use stronger condition if both generation numbers are ordinary generation numbers, and weaker condition when at least one generation number has one of those special values.quoted
+ +We use the macro GENERATION_NUMBER_MAX = 0x3FFFFFFF to for commits whose +generation numbers are computed to be at least this value. We limit at +this value since it is the largest value that can be stored in the +commit-graph file using the 30 bits available to generation numbers. This +presents another case where a commit can have generation number equal to +that of a parent.Ordinary generation numbers, where stronger condition holds, are those between GENERATION_NUMBER_ZERO < gen(C) < GENERATION_NUMBER_MAX.quoted
+ Design Details --------------@@ -98,17 +121,12 @@ Future Work - The 'commit-graph' subcommand does not have a "verify" mode that is necessary for integration with fsck. -- The file format includes room for precomputed generation numbers. These - are not currently computed, so all generation numbers will be marked as - 0 (or "uncomputed"). A later patch will include this calculation. -Good.quoted
- After computing and storing generation numbers, we must make graph walks aware of generation numbers to gain the performance benefits they enable. This will mostly be accomplished by swapping a commit-date-ordered priority queue with one ordered by generation number. The following - operations are important candidates: + operation is an important candidate: - - paint_down_to_common() - 'log --topo-order'Another possible candidates: - remove_redundant() - see comment in previous patch - still_interesting() - where Git uses date slop to stop walking too far
remove_redundant() will be included in v5, thanks. Instead of "still_interesting()" I'll add "git tag --merged" as the candidate to consider, as discussed in [1]. [1] https://public-inbox.org/git/87fu3g67ry.fsf@lant.ki.iif.hu/t/#u "branch --contains / tag --merged inconsistency"
quoted
- Currently, parse_commit_gently() requires filling in the root treeOne important issue left is handling features that change view of project history, and their interaction with commit-graph feature. What would happen, if we turn on commit-graph feature, generate commit graph file, and then: * use graft file or remove graft entries to cut history, or remove cut or join two [independent] histories. * use git-replace mechanims to do the same * in shallow clone, deepen or shorten the clone What would happen if without re-generating commit-graph file (assuming tha Git wouldn't do it for us), we run some feature that makes use of commit-graph data: - git branch --contains - git tag --contains - git rev-list A..B
The commit-graph is not supported in these scenarios (yet). grafts are specifically mentioned in the future work section. I'm not particularly interested in supporting these features, so they are good venues for other contributors to get involved in the commit-graph feature. Eventually, they will be blockers to making the commit-graph feature a "default" feature. That is when I will pay attention to these situations. For now, a user must opt-in to having a commit-graph file (and that same user has possibly opted in to these history modifying features). Thanks, -Stolee