Thread (31 messages) 31 messages, 6 authors, 2022-11-14

Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after

From: Victoria Dye <hidden>
Date: 2022-11-09 22:19:21

Derrick Stolee wrote:
On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
quoted
Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a
fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
(although both are so fast that the total execution time of 100 invocations
is needed to compare the results in the default perf repo).
One thing I found interesting is how you needed 200 iterations to show
a meaningful change in this test script, but in the case of 'git reset'
we can see sizeable improvements even with a single iteration.
All of the new performance tests run with multiple iterations: 20 for reset
(10 iterations of two resets each), 100 for read-tree, 200 for the
comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
were picked mostly by trial-and-error, to strike a balance of "the test
doesn't take too long to run" and "the change in execution time is clearly
visible in the results."
Is there something about this test that is artificially speeding up
these iterations? Perhaps the index has up-to-date filesystem information
that allows these methods to avoid filesystem interactions that are
necessary in the 'git reset' case?
I would expect the "cache_tree_update, invalid" test's execution time, when
scaled to the iterations of 'read-tree' and 'reset', to match the change in
timing of those commands, but the command tests are reporting *much* larger
improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
the results are *consistently* >=0.9s).

Per trace2 logs, a single invocation of 'read-tree' matching the one added
in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
the total time would be ~1.01s, which lines up with the 'p0006' test
results. However, the trace2 results for "test-tool cache-tree --count 3
--fresh --update" show the first iteration taking 0.013060s (looks good),
then the next taking 0.003755s, then 0.004026s (_much_ faster than
expected).

To be honest, I can't figure out what's going on there. It might be some
kind of runtime/memory optimization with repeatedly rebuilding the same
cache tree (doesn't seem to be compiler optimization, since the speedup
still happens with '-O0'). The only sure-fire way to avoid it seems to be
moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
Unfortunately, the command initialization overhead *really* slows things
down, but I can add a "control" test (with no cache tree refresh) to
quantify how long that initialization takes.

While looking into this, I found a few other things I'd like to add to/fix
in that test (add a "partially-invalidated" cache tree case, use the
original cache tree OID in 'prime_cache_tree()' rather than the OID at
HEAD), so I'll re-roll with those + the updated iteration logic.

Thanks for bringing this up!
 
quoted
The second patch introduces the 'skip_cache_tree_update' option for
'unpack_trees()', but does not use it yet.

The remaining three patches update callers that make the aforementioned
redundant cache tree updates. The performance impact on these callers ranges
from "negligible" (in 'rebase') to "substantial" (in 'read-tree') - more
details can be found in the commit messages of the patch associated with the
affected code path.
I found these patches well motivated and the code change to be so
unobtrusive that the benefits are well worth the new options.
Thanks!
Thanks,
-Stolee
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help