Re: git bug report: 'git add' hangs in a large repo which has sparse-checkout file with large number of patterns in it
From: Elijah Newren <hidden>
Date: 2022-06-30 03:10:37
On Wed, Jun 29, 2022 at 12:50 PM Dian Xu [off-list ref] wrote:
Dear Git developers,
Reporting Issue:
'git add' hangs in a large repo which has
sparse-checkout file with large number of patterns in it
Found in:
Git 2.34.3. Issue occurs after 'audit for interaction
with sparse-index' was introduced in add.c
Reproduction steps:
1. Clone a repo which has e.g. 2 million plus files
2. Enable sparse checkout by: git config core.sparsecheckout true
3. Create a .git/info/sparse-checkout file with a large
number of patterns, e.g. 16k plus linesDid you run `git read-tree -mu HEAD` or even `git sparse-checkout reapply` after step 3 and before step 4? If not, you've left the working tree out-of-sync with the specified sparsity paths and should fix that before running step 4.
4. Run 'git add', which will hang
Alternatively to the above, if you really want to add a file and ignore the fact that it might be outside the sparsity patterns (and risk it later randomly disappearing with checkout/rebase/merge/etc. commands), then you can use `git add --sparse $FILENAME`.
Investigations:
1. Stack trace:
add.c: cmd_add
-> add.c: prune_directory
-> pathspec.c: add_pathspec_matches_against_index
-> dir.c: path_in_sparse_checkout_1
2. In Git 2.33.3, the loop at pathspec.c line 42 runs
fast, even when istate->cache_nr is at 2 million
3. Since Git 2.34.3, the newly introduced 'audit for
interaction with sparse-index' (dir.c line 1459:
path_in_sparse_checkout_1) decides to loop through 2 million files and
match each one of them against the sparse-checkout patterns
4. This hits the O(n^2) problem thus causes 'git add' to
hang (or ~1.5 hours to finish)
Please help us take a look at this issue and let us know if you need
more information.I'm also curious if you can use --cone mode in sparse-checkout. The O(N*M) behavior of sparse checkouts in non-cone mode is pretty fundamental, and we may need to add additional paths checking the sparsity patterns (i.e. more O(N*M) codepaths) to fix various user-observed bugs. Usage of --cone mode drops all of these to a linear cost.