Re: [PATCH 0/5] Skip 'cache_tree_update()' when 'prime_cache_tree()' is called immediate after
From: Derrick Stolee <hidden>
Date: 2022-11-10 14:44:42
On 11/9/2022 5:18 PM, Victoria Dye wrote:
Derrick Stolee wrote:quoted
On 11/8/2022 5:44 PM, Victoria Dye via GitGitGadget wrote:quoted
Following up on a discussion [1] around cache tree refreshes in 'git reset', this series updates callers of 'unpack_trees()' to skip its internal invocation of 'cache_tree_update()' when 'prime_cache_tree()' is called immediately after 'unpack_trees()'. 'cache_tree_update()' can be an expensive operation, and it is redundant when 'prime_cache_tree()' clears and rebuilds the cache tree from scratch immediately after. The first patch adds a test directly comparing the execution time of 'prime_cache_tree()' with that of 'cache_tree_update()'. The results show that on a fully-valid cache tree, they perform the same, but on a fully-invalid cache tree, 'prime_cache_tree()' is multiple times faster (although both are so fast that the total execution time of 100 invocations is needed to compare the results in the default perf repo).One thing I found interesting is how you needed 200 iterations to show a meaningful change in this test script, but in the case of 'git reset' we can see sizeable improvements even with a single iteration.All of the new performance tests run with multiple iterations: 20 for reset (10 iterations of two resets each), 100 for read-tree, 200 for the comparison of 'cache_tree_update()' & 'prime_cache_tree()'. Those counts were picked mostly by trial-and-error, to strike a balance of "the test doesn't take too long to run" and "the change in execution time is clearly visible in the results."
Thanks for pointing out my misunderstanding. I missed the repeat counts because 2-3 seconds "seemed right" based on performance tests of large monorepos, but clearly that's not right when using the Git repository for performance tests.
quoted
Is there something about this test that is artificially speeding up these iterations? Perhaps the index has up-to-date filesystem information that allows these methods to avoid filesystem interactions that are necessary in the 'git reset' case?I would expect the "cache_tree_update, invalid" test's execution time, when scaled to the iterations of 'read-tree' and 'reset', to match the change in timing of those commands, but the command tests are reporting *much* larger improvements (e.g., I'd expect a 0.27s improvement in 'git read-tree', but the results are *consistently* >=0.9s). Per trace2 logs, a single invocation of 'read-tree' matching the one added in 'p0006' spent 0.010108s in 'cache_tree_update()'. Over 100 iterations, the total time would be ~1.01s, which lines up with the 'p0006' test results. However, the trace2 results for "test-tool cache-tree --count 3 --fresh --update" show the first iteration taking 0.013060s (looks good), then the next taking 0.003755s, then 0.004026s (_much_ faster than expected). To be honest, I can't figure out what's going on there. It might be some kind of runtime/memory optimization with repeatedly rebuilding the same cache tree (doesn't seem to be compiler optimization, since the speedup still happens with '-O0'). The only sure-fire way to avoid it seems to be moving the iteration outside of 'test-cache-tree.c' and into 'p0090'. Unfortunately, the command initialization overhead *really* slows things down, but I can add a "control" test (with no cache tree refresh) to quantify how long that initialization takes.
Getting unit-level performance tests is always tricky. Sometimes the best way to do it is to collect a sample using GIT_TRACE2_PERF and then manually collect the region times. It could be a fun project to integrate region measurements into the performance test suite instead of only end-to-end timings.
While looking into this, I found a few other things I'd like to add to/fix in that test (add a "partially-invalidated" cache tree case, use the original cache tree OID in 'prime_cache_tree()' rather than the OID at HEAD), so I'll re-roll with those + the updated iteration logic.
Taking a look now. Thanks! -Stolee