Re: [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames
From: Derrick Stolee <hidden>
Date: 2021-03-09 22:09:32
On 3/8/2021 7:09 PM, Elijah Newren via GitGitGadget wrote:> The basic idea here is:
We only need expensive rename detection on the subset of files changed on
both sides of history (for the most part).
This is because:
1. The primary reason for rename detection in merges is enabling three-way
content merges
2. The purpose of three-way content merges is reconciling changes when
both sides of history modified some file 3. If a file was only modified by
the side that renamed the file, then detecting the rename is irrelevant;
we'll get the same answer without knowing about the rename. 4. (Well...there
are rare cases where we need the rename for reasons other than three-way
content merges. Patch 5 explains those.)Makes sense. Don't compute information you won't need. I look forward to trying to figure out the special cases here.
Before Series After Series no-renames: 12.596 s ± 0.061 s 5.680 s ± 0.096 s mega-renames: 130.465 s ± 0.259 s 13.812 s ± 0.162 s just-one-mega: 3.958 s ± 0.010 s 506.0 ms ± 3.9 ms
These are _very_ impressive numbers for such a "simple" idea.
However, interestingly, if we had ignored the basename-guided rename
detection optimizations[2][3], then this optimization series would have
improved the performance as follows:
Before Basename Series After Just This Series
no-renames: 13.815 s ± 0.062 s 5.728 s ± 0.104 s
mega-renames: 1799.937 s ± 0.493 s 18.213 s ± 0.139 s
just-one-mega 51.289 s ± 0.019 s 891.9 ms ± 7.0 msAnd here it is even more impressive. I see that your optimizations are having combined effects but also are doing valuable things on their own.
We get best results by prioritizing them as follows: 1. exact rename detection 2. skip-because-unnecessary 3. basename-guided rename detection
This makes sense to me, since even the basename-guided rename is doing some non-trivial work. It would be good to reduce that effort.
it means that our remaining optimization potential is somewhat limited, and subsequent optimization series will have to fight for much smaller gains.
This is a good place to end up. Let the code rest for a bit after we are done here, and maybe we'll find new cases to care about later. We could chase the long tail forever, but these steps are a huge accomplishment! Getting to reading now. -Stolee