Thread (31 messages) 31 messages, 6 authors, 2022-11-14

Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after

From: Derrick Stolee <hidden>
Date: 2022-11-10 14:44:42

On 11/9/2022 5:18 PM, Victoria Dye wrote:
Derrick Stolee wrote:
quoted
On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:
quoted
Following up on a discussion [1] around cache tree refreshes in 'git reset',
this series updates callers of 'unpack_trees()' to skip its internal
invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called
immediately after 'unpack_trees()'. 'cache_tree_update()' can be an
expensive operation, and it is redundant when 'prime_cache_tree()' clears
and rebuilds the cache tree from scratch immediately after.

The first patch adds a test directly comparing the execution time of
'prime_cache_tree()' with that of 'cache_tree_update()'. The results show
that on a fully-valid cache tree, they perform the same, but on a
fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster
(although both are so fast that the total execution time of 100 invocations
is needed to compare the results in the default perf repo).
One thing I found interesting is how you needed 200 iterations to show
a meaningful change in this test script, but in the case of 'git reset'
we can see sizeable improvements even with a single iteration.
All of the new performance tests run with multiple iterations: 20 for reset
(10 iterations of two resets each), 100 for read-tree, 200 for the
comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts
were picked mostly by trial-and-error, to strike a balance of "the test
doesn't take too long to run" and "the change in execution time is clearly
visible in the results."
Thanks for pointing out my misunderstanding. I missed the repeat counts
because 2-3 seconds "seemed right" based on performance tests of large
monorepos, but clearly that's not right when using the Git repository for
performance tests.
quoted
Is there something about this test that is artificially speeding up
these iterations? Perhaps the index has up-to-date filesystem information
that allows these methods to avoid filesystem interactions that are
necessary in the 'git reset' case?
I would expect the "cache_tree_update, invalid" test's execution time, when
scaled to the iterations of 'read-tree' and 'reset', to match the change in
timing of those commands, but the command tests are reporting *much* larger
improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but
the results are *consistently* >=0.9s).

Per trace2 logs, a single invocation of 'read-tree' matching the one added
in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations,
the total time would be ~1.01s, which lines up with the 'p0006' test
results. However, the trace2 results for "test-tool cache-tree --count 3
--fresh --update" show the first iteration taking 0.013060s (looks good),
then the next taking 0.003755s, then 0.004026s (_much_ faster than
expected).

To be honest, I can't figure out what's going on there. It might be some
kind of runtime/memory optimization with repeatedly rebuilding the same
cache tree (doesn't seem to be compiler optimization, since the speedup
still happens with '-O0'). The only sure-fire way to avoid it seems to be
moving the iteration outside of 'test-cache-tree.c' and into 'p0090'.
Unfortunately, the command initialization overhead *really* slows things
down, but I can add a "control" test (with no cache tree refresh) to
quantify how long that initialization takes.
Getting unit-level performance tests is always tricky. Sometimes the best
way to do it is to collect a sample using GIT_TRACE2_PERF and then manually
collect the region times. It could be a fun project to integrate region
measurements into the performance test suite instead of only end-to-end
timings.
 
While looking into this, I found a few other things I'd like to add to/fix
in that test (add a "partially-invalidated" cache tree case, use the
original cache tree OID in 'prime_cache_tree()' rather than the OID at
HEAD), so I'll re-roll with those + the updated iteration logic.
Taking a look now. Thanks!

-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help