Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()
From: Derrick Stolee <hidden>
Date: 2018-05-01 11:47:07
On 4/30/2018 6:19 PM, Jakub Narebski wrote:
Derrick Stolee [off-list ref] writes:quoted
When running 'git branch --contains', the in_merge_bases_many() method calls paint_down_to_common() to discover if a specific commit is reachable from a set of branches. Commits with lower generation number are not needed to correctly answer the containment query of in_merge_bases_many(). Add a new parameter, min_generation, to paint_down_to_common() that prevents walking commits with generation number strictly less than min_generation. If 0 is given, then there is no functional change.This is thanks to the fact that generation numbers start at zero (as special case, though, with _ZERO), and we use strict inequality to avoid handling _ZERO etc. in a special way. As you wrote in response in previous version of this series, because paint_down_to_common() is file-local, there is no need to come up with symbolic name for GENERATION_NO_CUTOFF case. All right.quoted
For in_merge_bases_many(), we can pass commit->generation as the cutoff, and this saves time during 'git branch --contains' queries that would otherwise walk "around" the commit we are inspecting.All right, and when using paint_down_to_common() to actually find merge bases, and not only check for containment, we cannot use cutoff. Therefore at least one call site needs to run it without functional change... which we can do. Good.quoted
For a copy of the Linux repository, where HEAD is checked out at v4.13~100, we get the following performance improvement for 'git branch --contains' over the previous commit: Before: 0.21s After: 0.13s Rel %: -38%Nice.quoted
Signed-off-by: Derrick Stolee <redacted> --- commit.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)Let me reorder chunks a bit to make it easier to review.quoted
diff --git a/commit.c b/commit.c index 7bb007f56a..e2e16ea1a7 100644 --- a/commit.c +++ b/commit.c@@ -1070,7 +1080,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit * if (commit->generation > min_generation) return ret; - bases = paint_down_to_common(commit, nr_reference, reference); + bases = paint_down_to_common(commit, nr_reference, reference, commit->generation); if (commit->object.flags & PARENT2) ret = 1; clear_commit_marks(commit, all_flags);@@ -808,11 +808,14 @@ static int queue_has_nonstale(struct prio_queue *queue) } /* all input commits in one and twos[] must have been parsed! */ -static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos) +static struct commit_list *paint_down_to_common(struct commit *one, int n, + struct commit **twos, + int min_generation) { struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; struct commit_list *result = NULL; int i; + uint32_t last_gen = GENERATION_NUMBER_INFINITY; one->object.flags |= PARENT1; if (!n) {@@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc struct commit_list *parents; int flags; + if (commit->generation > last_gen) + BUG("bad generation skip"); + last_gen = commit->generation;Shouldn't we provide more information about where the problem is to the user, to make it easier to debug the repository / commit-graph data? Good to have this sanity check here.
This BUG() _should_ only be seen by developers who add callers which do
not load commits from the commit-graph file. There is a chance that
there are cases not covered by this patch and the added tests, though.
Hopefully we catch them all by dogfooding the feature before turning it
on by default.
I can add the following to help debug these bad situations:
+ BUG("bad generation skip %d > %d at %s",
+ commit->generation, last_gen,
+ oid_to_hex(&commit->object.oid));
quoted
+ + if (commit->generation < min_generation) + break;So the reasoning for this, as far as I understand, is the following. Please correct me if I am wrong. The callsite with non-zero min_generation, in_merge_bases_many(), tries to find out if "commit" is an ancestor of one of the "references". At least one of "references" is above "commit", so in_merge_bases_many() uses paint_down_to_common() - but is interested only if "commit" was painted as reachable from one of "references". Thus we can interrupt the walk if we know that none of [considered] commits in the queue can reach "commit"/"one" - as if they were all STALE. The search is done using priority queue (a bit like in Dijkstra algorithm), with newer commits - with larger generation numbers - considered first. Thus if current commit has generation number less than min_generation cutoff, i.e. if it is below "commit", then all remaining commits in the queue are below cutoff. Good.quoted
+ flags = commit->object.flags & (PARENT1 | PARENT2 | STALE); if (flags == (PARENT1 | PARENT2)) { if (!(commit->object.flags & RESULT)) {@@ -879,7 +889,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co return NULL; } - list = paint_down_to_common(one, n, twos); + list = paint_down_to_common(one, n, twos, 0);When calculating merge bases there is no such possibility of an early return due to generation number cutoff. All right then.quoted
while (list) { struct commit *commit = pop_commit(&list);@@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int cnt) filled_index[filled] = j; work[filled++] = array[j]; } - common = paint_down_to_common(array[i], filled, work); + common = paint_down_to_common(array[i], filled, work, 0);Here we are interested not only if "one"/array[i] is reachable from "twos"/work, but also if "twos" is reachable from "one". Simple cutoff only works in one way, though I wonder if we couldn't use cutoff being minimum generation number of "one" and "twos" together. But that may be left for a separate commit (after checking that the above is correct). Not as simple and obvious as paint_down_to_common() used in in_merge_bases_any(), so it is all right.
Thanks for reporting this. Since we are only concerned about reachability in this method, it is a good candidate to use min_generation. It is also subtle enough that we should leave it as a separate commit. Also, we can measure performance improvements separately, as I will mention in my commit message (but I'll copy it here): For a copy of the Linux repository, we measured the following performance improvements: git merge-base v3.3 v4.5 Before: 234 ms After: 208 ms Rel %: -11% git merge-base v4.3 v4.5 Before: 102 ms After: 83 ms Rel %: -19% The experiments above were chosen to demonstrate that we are improving the filtering of the merge-base set. In the first example, more time is spent walking the history to find the set of merge bases before the remove_redundant() call. The starting commits are closer together in the second example, therefore more time is spent in remove_redundant(). The relative change in performance differs as expected.
quoted
if (array[i]->object.flags & PARENT2) redundant[i] = 1; for (j = 0; j < filled; j++)