Thread (12 messages) 12 messages, 4 authors, 2022-07-12

Re: git bug report: 'git add' hangs in a large repo which has sparse-checkout file with large number of patterns in it

From: Elijah Newren <hidden>
Date: 2022-06-30 03:10:37

On Wed, Jun 29, 2022 at 12:50 PM Dian Xu [off-list ref] wrote:
Dear Git developers,

Reporting Issue:
              'git add' hangs in a large repo which has
sparse-checkout file with large number of patterns in it

Found in:
              Git 2.34.3. Issue occurs after 'audit for interaction
with sparse-index' was introduced in add.c

Reproduction steps:
              1. Clone a repo which has e.g. 2 million plus files
              2. Enable sparse checkout by: git config core.sparsecheckout true
              3. Create a .git/info/sparse-checkout file with a large
number of patterns, e.g. 16k plus lines
Did you run `git read-tree -mu HEAD` or even `git sparse-checkout
reapply` after step 3 and before step 4?  If not, you've left the
working tree out-of-sync with the specified sparsity paths and should
fix that before running step 4.
              4. Run 'git add', which will hang
Alternatively to the above, if you really want to add a file and
ignore the fact that it might be outside the sparsity patterns (and
risk it later randomly disappearing with checkout/rebase/merge/etc.
commands), then you can use `git add --sparse $FILENAME`.
Investigations:
              1. Stack trace:
                       add.c: cmd_add
                  -> add.c: prune_directory
                  -> pathspec.c: add_pathspec_matches_against_index
                  -> dir.c: path_in_sparse_checkout_1
              2. In Git 2.33.3, the loop at pathspec.c line 42 runs
fast, even when istate->cache_nr is at 2 million
              3. Since Git 2.34.3, the newly introduced 'audit for
interaction with sparse-index' (dir.c line 1459:
path_in_sparse_checkout_1) decides to loop through 2 million files and
match each one of them against the sparse-checkout patterns
              4. This hits the O(n^2) problem thus causes 'git add' to
hang (or ~1.5 hours to finish)

Please help us take a look at this issue and let us know if you need
more information.
I'm also curious if you can use --cone mode in sparse-checkout.  The
O(N*M) behavior of sparse checkouts in non-cone mode is pretty
fundamental, and we may need to add additional paths checking the
sparsity patterns (i.e. more O(N*M) codepaths) to fix various
user-observed bugs.  Usage of --cone mode drops all of these to a
linear cost.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help