Thread (39 messages) 39 messages, 3 authors, 2021-03-15

Re: [PATCH v2 0/8] Optimization batch 9: avoid detecting irrelevant renames

From: Derrick Stolee <hidden>
Date: 2021-03-09 22:09:32

On 3/8/2021 7:09 PM, Elijah Newren via GitGitGadget wrote:> The basic idea here is:
We only need expensive rename detection on the subset of files changed on
both sides of history (for the most part).

This is because:

 1. The primary reason for rename detection in merges is enabling three-way
    content merges
 2. The purpose of three-way content merges is reconciling changes when

both sides of history modified some file 3. If a file was only modified by
the side that renamed the file, then detecting the rename is irrelevant;
we'll get the same answer without knowing about the rename. 4. (Well...there
are rare cases where we need the rename for reasons other than three-way
content merges. Patch 5 explains those.)
Makes sense. Don't compute information you won't need. I look forward to
trying to figure out the special cases here.
 
                     Before Series           After Series
no-renames:       12.596 s ±  0.061 s     5.680 s ±  0.096 s
mega-renames:    130.465 s ±  0.259 s    13.812 s ±  0.162 s
just-one-mega:     3.958 s ±  0.010 s   506.0  ms ±  3.9  ms
These are _very_ impressive numbers for such a "simple" idea.
 
However, interestingly, if we had ignored the basename-guided rename
detection optimizations[2][3], then this optimization series would have
improved the performance as follows:

               Before Basename Series   After Just This Series
no-renames:      13.815 s ±  0.062 s      5.728 s ±  0.104 s
mega-renames:  1799.937 s ±  0.493 s     18.213 s ±  0.139 s
just-one-mega    51.289 s ±  0.019 s    891.9  ms ±  7.0  ms
And here it is even more impressive. I see that your optimizations are
having combined effects but also are doing valuable things on their
own.
We get best results by prioritizing them as follows:

 1. exact rename detection
 2. skip-because-unnecessary
 3. basename-guided rename detection
This makes sense to me, since even the basename-guided rename is
doing some non-trivial work. It would be good to reduce that
effort.
it means that our remaining optimization potential is
somewhat limited, and subsequent optimization series will have to fight for
much smaller gains.
This is a good place to end up. Let the code rest for a bit after
we are done here, and maybe we'll find new cases to care about
later. We could chase the long tail forever, but these steps are
a huge accomplishment!

Getting to reading now.

-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help