--- v7
+++ v4
@@ -26,475 +26,111 @@
Thanks, -Stolee
-Update in V7 (relative to v5)
-=============================
+Updates in V4
+=============
-APOLOGIES: As I was working on this cover letter, I was still organizing my
-big list of patches, including reordering some into this series. I forgot to
-actually include them in my v6 submission, so here is a re-submission.
-Please ignore v6.
+ * The previous patch "unpack-trees: stop recursing into sparse directories"
+ was confusing, and actually a bit sloppy.
+ * It has been replaced with "unpack-trees: be careful around sparse
+ directory entries" which takes the sparse-directory checks and raises
+ them higher up into unpack_trees.c instead of in diff-lib.c.
-I'm sorry that this revision took so long. Initially I was blocked on
-getting the directory/file conflict figured out (I did), but also my team
-was very busy with some things. Eventually, we reached an internal deadline
-to make an experimental release available [1] with initial sparse-index
-performance boosts. Creating that included some additional review by Jeff
-Hostetler and Johannes Schindelin which led to more changes in this version.
-The good news is that this series has now created the basis for many Git
-commands to integrate with the sparse-index without much additional work.
-This effort was unfortunately overloaded on this series because the changes
-needed for things like 'git checkout' or 'git add' all intersect with the
-changes needed for 'git status'. Might as well get it right the first time.
+Updates in V3
+=============
-Because the range-diff is a big difficult to read this time, I'll break the
-changes down on a patch-by-patch basis.
+Sorry that this was a long time coming. I got a little side-tracked on other
+projects, but I also worked to get the sparse-index feature working against
+the Scalar functional tests, which contain many special cases around the
+sparse-checkout feature as they were inherited from special cases that arose
+in the virtualized environment of VFS for Git. This version contains my
+fixes based on that investigation. Most of these were easy to identify and
+fix, but I was blocked for a long time struggling with a bug when combining
+the sparse-index with the builtin FS Monitor feature, but I've reported my
+findings already [1].
- 1. sparse-index: skip indexes with unmerged entries
-
- (no change)
+[1]
+https://lore.kernel.org/git/0b9e54ba-ac27-e537-7bef-1b4448f92352@gmail.com/
- 2. sparse-index: include EXTENDED flag when expanding
+ * Updated comments and tests based on the v2 feedback.
+ * Expanded the test repository data shape based on the special cases found
+ during my investigation.
+ * Added several commits that either fix errors in the status code, or fix
+ errors in the previous sparse-index series, specifically:
+ * When in a conflict state, the cache-tree fails to update. For now, skip
+ writing a sparse-index until this can be resolved more carefully.
+ * When expanding a sparse-directory entry, we set the CE_SKIP_WORKTREE
+ bit but forgot the CE_EXTENDED bit.
+ * git status had failures if there was a sparse-directory entry as the
+ first entry within a directory.
+ * When expanding a directory to report its status, such as when a
+ sparse-directory is staged but doesn't exist at HEAD (such as in an
+ orphaned commit) we did not previously recurse correctly into
+ subdirectories.
+ * Be extra careful with the FS Monitor data when expanding or contracting
+ an index. This version now abandons all FS Monitor data at these
+ conversion points with the expectation that in the future these
+ conversions will be rare so the FS Monitor feature can work
+ efficiently. Updates in V2
- * Commit message better describes the purpose of the change.
+----------------------------------------------------------------------------
- 3. t1092: replace incorrect 'echo' with 'cat'
+ * Based on the feedback, it is clear that 'git add' will require much more
+ careful testing and thought. I'm splitting it out of this series and it
+ will return with a follow-up.
+ * Test cases are improved, both in coverage and organization.
+ * The previous "unpack-trees: make sparse aware" patch is split into three
+ now.
+ * Stale messages based on an old implementation of the "protections" topic
+ are now fixed.
+ * Performance tests were re-run.
- * Typo fix
-
- 4. t1092: expand repository data shape
-
- * some files are added that surround "folder1/" immediately before and
- after, based on the sorting with the trailing slash. This provides extra
- coverage.
-
- 5. t1092: add tests for status/add and sparse files
-
- (no change)
-
- 6. unpack-trees: preserve cache_bottom
-
- (no change)
-
- 7. unpack-trees: compare sparse directories correctly
-
- * We were previosly not comparing the path lengths, which causes a problem
- (with later updates) when a sparse directory such as "folder1/0/" gets
- compared to a tree name "folder1".
-
- 8. unpack-trees: rename unpack_nondirectories()
-
- * This new commit changes the name to make more sense with its new behavior
- that could modify a sparse directory entry. The point of the method is in
- contrast to recursing into trees.
-
- 9. unpack-trees: unpack sparse directory entries
-
- * THIS is the biggest change from previous versions. There were a few
- things going on that were tricky to get right, especially with the
- directory/file conflict (handled in an update in the following, new
- patch).
-
- * The changes to create_ce_entry() regarding alloc_len missed a spot that
- was critical to getting the length right in the allocated entry.
-
- * Use '\0' over 0 to represent the terminating character.
-
- * We don't need a "sparse_directory" parameter to unpack_nondirectories()
- (which was renamed to unpack_single_entry() by the previous new patch)
- because we can use dirmask to discover if src[0] (or any other value)
- should be a sparse directory entry.
-
- * Similarly, we don't need to call the method twice from unpack_callback().
-
- * The 'conflicts' variable is set to match the dirmask in the beginning,
- but it should depend on whether or not we have a sparse directory entry
- instead, and if all trees that have the path have a directory.
-
- * The implementation of find_cache_entry() uses find_cache_pos() to find an
- insertion position for a path if it doesn't find an exact match. Before,
- we subtracted one to find the sparse directory entry, but there could be
- multiple paths between the sparse directory entry and the insertion
- point, so we need to walk backwards until we find it. This requires many
- paths having the same prefix, so hopefully is a rare case. Some of the
- test data changes were added to cover the need for this logic. This uses
- a helper method, sparse_dir_matches_path, which is also used by
- is_sparse_directory_entry.
-
- 10. unpack-trees: handle dir/file conflict of sparse entries
-
- * This new logic inside twoway_merge handles the special case for dealing
- with a directory/file conflict during a 'git checkout'. The necessarily
- data and tests are also added here, though the logic will only take
- serious effect when we integrate with 'git checkout' later.
-
- 11. dir.c: accept a directory as part of cone-mode patterns
-
- * The value slash_pos was previously a pointer within a strbuf, but in some
- cases we add to that strbuf and that could reallocate the pointer, making
- slash_pos be invalid. The replacement is to have slash_pos be an integer
- position within the string, so it is consistent even if the string is
- reallocated for an append.
-
- 12. diff-lib: handle index diffs with sparse dirs
-
- * As recommended in the previous review, a simple diff_tree_oid() replaces
- the complicated use of read_tree_at() and traverse_trees() in the
- previous version.
-
- 13. status: skip sparse-checkout percentage with sparse-index
-
- (no change)
-
- 14. status: use sparse-index throughout
-
- (no change)
-
- 15. wt-status: expand added sparse directory entries
-
- * Duplicate 'git status --porcelain=v2' lines are removed from tests.
-
- * The pathspec is initialized using "= { 0 }" instead of memset().
-
- 16. fsmonitor: integrate with sparse index
-
- * An extra test_region is added to ensure that the filesystem monitor hook
- is still being called, and we are not simply disabling the feature
- entirely.
-
-Derrick Stolee (16):
+Derrick Stolee (12):
sparse-index: skip indexes with unmerged entries
sparse-index: include EXTENDED flag when expanding
- t1092: replace incorrect 'echo' with 'cat'
t1092: expand repository data shape
t1092: add tests for status/add and sparse files
unpack-trees: preserve cache_bottom
unpack-trees: compare sparse directories correctly
- unpack-trees: rename unpack_nondirectories()
- unpack-trees: unpack sparse directory entries
- unpack-trees: handle dir/file conflict of sparse entries
+ unpack-trees: be careful around sparse directory entries
dir.c: accept a directory as part of cone-mode patterns
- diff-lib: handle index diffs with sparse dirs
status: skip sparse-checkout percentage with sparse-index
status: use sparse-index throughout
wt-status: expand added sparse directory entries
fsmonitor: integrate with sparse index
builtin/commit.c | 3 +
- diff-lib.c | 16 ++
- dir.c | 24 ++-
+ dir.c | 11 +++
read-cache.c | 10 +-
- sparse-index.c | 27 +++-
- t/t1092-sparse-checkout-compatibility.sh | 181 ++++++++++++++++++++++-
- t/t7519-status-fsmonitor.sh | 49 ++++++
- unpack-trees.c | 145 +++++++++++++++---
- wt-status.c | 65 +++++++-
+ sparse-index.c | 27 +++++-
+ t/t1092-sparse-checkout-compatibility.sh | 117 ++++++++++++++++++++++-
+ t/t7519-status-fsmonitor.sh | 48 ++++++++++
+ unpack-trees.c | 26 ++++-
+ wt-status.c | 64 ++++++++++++-
wt-status.h | 1 +
- 10 files changed, 485 insertions(+), 36 deletions(-)
+ 9 files changed, 295 insertions(+), 12 deletions(-)
-base-commit: ebf3c04b262aa27fbb97f8a0156c2347fecafafb
-Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v7
-Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v7
+base-commit: f723f370c89ad61f4f40aabfd3540b1ce19c00e5
+Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-932%2Fderrickstolee%2Fsparse-index%2Fstatus-and-add-v4
+Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-932/derrickstolee/sparse-index/status-and-add-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/932
-Range-diff vs v6:
+Range-diff vs v3:
- 1: 2a4a7256304 = 1: 2a4a7256304 sparse-index: skip indexes with unmerged entries
- 2: f5bae86014d = 2: f5bae86014d sparse-index: include EXTENDED flag when expanding
- 3: d965669c766 = 3: d965669c766 t1092: replace incorrect 'echo' with 'cat'
- 4: 44a940211b2 ! 4: e10fa11cfdb t1092: expand repository data shape
- @@ Commit message
- one entry and are the first entry of a directory with multiple
- entries.
-
- + * Add filenames adjacent to a sparse directory entry that sort before
- + and after the trailing slash.
- +
- Later tests will take advantage of these shapes, but they also deepen
- the tests that already exist.
-
- @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'setup' '
- + mkdir deep/deeper1/0/0 &&
- + touch deep/deeper1/0/1 &&
- + touch deep/deeper1/0/0/0 &&
- ++ >folder1- &&
- ++ >folder1.x &&
- ++ >folder10 &&
- + cp -r deep/deeper1/0 folder1 &&
- + cp -r deep/deeper1/0 folder2 &&
- + echo >>folder1/0/0/0 &&
- 5: 701ac0e8ff6 = 5: e94ffa07d46 t1092: add tests for status/add and sparse files
- 6: 587333f7c61 = 6: a8dda933567 unpack-trees: preserve cache_bottom
- 7: 6fc898ac23e ! 7: e52166f6e4c unpack-trees: compare sparse directories correctly
- @@ Commit message
- Within compare_entry(), it first calls do_compare_entry() to check the
- leading portion of the name. When the input path is a directory name, we
- could match exactly already. Thus, we should return 0 if we have an
- - exact string match on a sparse directory entry.
- + exact string match on a sparse directory entry. The final check is a
- + length comparison between the strings.
-
- Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
-
- @@ unpack-trees.c: static int compare_entry(const struct cache_entry *ce, const str
- + * works when the input name is a directory, since ce->name
- + * ends in a directory separator.
- + */
- -+ if (S_ISSPARSEDIR(ce->ce_mode))
- ++ if (S_ISSPARSEDIR(ce->ce_mode) &&
- ++ ce->ce_namelen == traverse_path_len(info, tree_entry_len(n)) + 1)
- + return 0;
- +
- /*
- -: ----------- > 8: d04b62381b8 unpack-trees: rename unpack_nondirectories()
- 8: b676ef4925b ! 9: 237ccf4e43d unpack-trees: unpack sparse directory entries
- @@ Commit message
- might want to perform a merge operation on the entry, such as during
- 'git checkout <commit>' which wants to replace a sparse tree entry with
- the tree for that path at the target commit. We extend the logic within
- - unpack_nondirectories() to create a sparse-directory entry in this case,
- - and then that is sent to call_unpack_fn(). Since the name becomes
- - confusing by handling directories, rename it to unpack_single_entry()
- - since it handles a blob entry or a sparse directory entry without using
- - traverse_trees_recursive().
- + unpack_single_entry() to create a sparse-directory entry in this case,
- + and then that is sent to call_unpack_fn().
-
- There are some subtleties in this process. For instance, we need to
- update find_cache_entry() to allow finding a sparse-directory entry that
- exactly matches a given path. Use the new helper method
- - sparse_dir_matches_path() for this.
- + sparse_dir_matches_path() for this. We also need to ignore conflict
- + markers in the case that the entries correspond to directories and we
- + already have a sparse directory entry.
-
- Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
-
- @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
- return ce;
- }
-
- -@@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse_info *info,
- - * without actually calling it. If you change the logic here you may need to
- - * check and change there as well.
- - */
- --static int unpack_nondirectories(int n, unsigned long mask,
- -- unsigned long dirmask,
- -- struct cache_entry **src,
- -- const struct name_entry *names,
- -- const struct traverse_info *info)
- -+static int unpack_single_entry(int n, unsigned long mask,
- -+ unsigned long dirmask,
- -+ struct cache_entry **src,
- -+ const struct name_entry *names,
- -+ const struct traverse_info *info,
- -+ int sparse_directory)
- - {
- - int i;
- +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
- struct unpack_trees_options *o = info->data;
- unsigned long conflicts = info->df_conflicts | dirmask;
-
- @@ unpack-trees.c: static struct cache_entry *create_ce_entry(const struct traverse
- if (mask == dirmask && !src[0])
- return 0;
-
- -+ /* defer work if our cache entry doesn't match the expectations. */
- -+ if (sparse_directory) {
- -+ if (src[0] && !S_ISSPARSEDIR(src[0]->ce_mode))
- -+ BUG("expected sparse directory entry");
- -+ } else if (src[0] && S_ISSPARSEDIR(src[0]->ce_mode)) {
- -+ return 0;
- -+ }
- ++ /*
- ++ * When we have a sparse directory entry for src[0],
- ++ * then this isn't necessarily a directory-file conflict.
- ++ */
- ++ if (mask == dirmask && src[0] &&
- ++ S_ISSPARSEDIR(src[0]->ce_mode))
- ++ conflicts = 0;
- +
- /*
- * Ok, we've filled in up to any potential index entry in src[0],
- * now do the rest.
- -@@ unpack-trees.c: static int unpack_nondirectories(int n, unsigned long mask,
- +@@ unpack-trees.c: static int unpack_single_entry(int n, unsigned long mask,
- * not stored in the index. otherwise construct the
- * cache entry from the index aware logic.
- */
- - src[i + o->merge] = create_ce_entry(info, names + i, stage, &o->result, o->merge);
- + src[i + o->merge] = create_ce_entry(info, names + i, stage,
- + &o->result, o->merge,
- -+ sparse_directory);
- ++ bit & dirmask);
- }
-
- if (o->merge) {
- @@ unpack-trees.c: static int find_cache_pos(struct traverse_info *info,
- + * Check for a sparse-directory entry named "path/".
- + * Due to the input p->path not having a trailing
- + * slash, the negative 'pos' value overshoots the
- -+ * expected position by one, hence "-2" here.
- ++ * expected position by at least one, hence "-2" here.
- + */
- + pos = -pos - 2;
- +
- + if (pos < 0 || pos >= o->src_index->cache_nr)
- return NULL;
- +
- -+ ce = o->src_index->cache[pos];
- ++ /*
- ++ * We might have multiple entries between 'pos' and
- ++ * the actual sparse-directory entry, so start walking
- ++ * back until finding it or passing where it would be.
- ++ */
- ++ while (pos >= 0) {
- ++ ce = o->src_index->cache[pos];
- ++
- ++ if (strncmp(ce->name, p->path, p->pathlen))
- ++ return NULL;
- +
- -+ if (!S_ISSPARSEDIR(ce->ce_mode))
- -+ return NULL;
- ++ if (S_ISSPARSEDIR(ce->ce_mode) &&
- ++ sparse_dir_matches_path(ce, info, p))
- ++ return ce;
- +
- -+ if (sparse_dir_matches_path(ce, info, p))
- -+ return ce;
- ++ pos--;
- ++ }
- +
- + return NULL;
- }
- @@ unpack-trees.c: static void debug_unpack_callback(int n,
- /*
- * Note that traverse_by_cache_tree() duplicates some logic in this function
- * without actually calling it. If you change the logic here you may need to
- -@@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
- - }
- - }
- -
- -- if (unpack_nondirectories(n, mask, dirmask, src, names, info) < 0)
- -+ if (unpack_single_entry(n, mask, dirmask, src, names, info, 0) < 0)
- - return -1;
- -
- - if (o->merge && src[0]) {
- @@ unpack-trees.c: static int unpack_callback(int n, unsigned long mask, unsigned long dirmask, str
- }
- }
-
- - if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
- - names, info) < 0)
- -+ if (is_sparse_directory_entry(src[0], names, info)) {
- -+ if (unpack_single_entry(n, dirmask, mask & ~dirmask, src, names, info, 1) < 0)
- -+ return -1;
- -+ } else if (traverse_trees_recursive(n, dirmask, mask & ~dirmask,
- ++ if (!is_sparse_directory_entry(src[0], names, info) &&
- ++ traverse_trees_recursive(n, dirmask, mask & ~dirmask,
- + names, info) < 0) {
- return -1;
- + }
- -: ----------- > 10: 9f31c691af6 unpack-trees: handle dir/file conflict of sparse entries
- 9: d693f00d9a2 ! 11: 2a43287c47e dir.c: accept a directory as part of cone-mode patterns
- @@ Commit message
- Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
-
- ## dir.c ##
- +@@ dir.c: enum pattern_match_result path_matches_pattern_list(
- + struct path_pattern *pattern;
- + struct strbuf parent_pathname = STRBUF_INIT;
- + int result = NOT_MATCHED;
- +- const char *slash_pos;
- ++ size_t slash_pos;
- +
- + if (!pl->use_cone_patterns) {
- + pattern = last_matching_pattern_from_list(pathname, pathlen, basename,
- @@ dir.c: enum pattern_match_result path_matches_pattern_list(
- strbuf_addch(&parent_pathname, '/');
- strbuf_add(&parent_pathname, pathname, pathlen);
- @@ dir.c: enum pattern_match_result path_matches_pattern_list(
- + * use the file-base matching logic in an equivalent way.
- + */
- + if (parent_pathname.len > 0 &&
- -+ parent_pathname.buf[parent_pathname.len - 1] == '/')
- ++ parent_pathname.buf[parent_pathname.len - 1] == '/') {
- ++ slash_pos = parent_pathname.len - 1;
- + strbuf_add(&parent_pathname, "-", 1);
- ++ } else {
- ++ const char *slash_ptr = strrchr(parent_pathname.buf, '/');
- ++ slash_pos = slash_ptr ? slash_ptr - parent_pathname.buf : 0;
- ++ }
- +
- if (hashmap_contains_path(&pl->recursive_hashmap,
- &parent_pathname)) {
- result = MATCHED_RECURSIVE;
- + goto done;
- + }
- +
- +- slash_pos = strrchr(parent_pathname.buf, '/');
- +-
- +- if (slash_pos == parent_pathname.buf) {
- ++ if (!slash_pos) {
- + /* include every file in root */
- + result = MATCHED;
- + goto done;
- + }
- +
- +- strbuf_setlen(&parent_pathname, slash_pos - parent_pathname.buf);
- ++ strbuf_setlen(&parent_pathname, slash_pos);
- +
- + if (hashmap_contains_path(&pl->parent_hashmap, &parent_pathname)) {
- + result = MATCHED;
- 10: ed11cfc791f ! 12: f83aa08ff6b diff-lib: handle index diffs with sparse dirs
- @@ Commit message
- identical to before: the lack of a cache entry is the same with a sparse
- index.
-
- - Use diff_tree_oid() appropriately to appropriately compute the diff.
- + Use diff_tree_oid() appropriately to compute the diff.
-
- Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
-
- 11: 48fd25aacbe = 13: 35063ffb8ed status: skip sparse-checkout percentage with sparse-index
- 12: 3499105eb67 = 14: b4033a9bf36 status: use sparse-index throughout
- 13: 08225483d69 ! 15: 717a3f49f97 wt-status: expand added sparse directory entries
- @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
- + struct string_list_item *it;
- + struct wt_status_change_data *d;
- + struct wt_status *s = context;
- -+ char *full_name;
- ++ struct strbuf full_name = STRBUF_INIT;
- +
- + if (S_ISDIR(mode))
- + return READ_TREE_RECURSIVE;
- +
- -+ full_name = xstrfmt("%s%s", base->buf, path);
- -+ it = string_list_insert(&s->change, full_name);
- ++ strbuf_add(&full_name, base->buf, base->len);
- ++ strbuf_addstr(&full_name, path);
- ++ it = string_list_insert(&s->change, full_name.buf);
- + d = it->util;
- + if (!d) {
- + CALLOC_ARRAY(d, 1);
- @@ wt-status.c: static void wt_status_collect_changes_index(struct wt_status *s)
- + d->mode_index = mode;
- + oidcpy(&d->oid_index, oid);
- + s->committable = 1;
- ++ strbuf_release(&full_name);
- + return 0;
- +}
- +
- 14: 711e403a63a ! 16: 1d744848ee6 fsmonitor: integrate with sparse index
- @@ t/t7519-status-fsmonitor.sh: test_expect_success 'status succeeds after staging/
- + GIT_TRACE2_EVENT="$(pwd)/trace2.txt" GIT_TRACE2_EVENT_NESTING=10 \
- + git status --porcelain=v2 >actual &&
- + test_region $1 index ensure_full_index trace2.txt &&
- ++ test_region fsm_hook query trace2.txt &&
- + test_cmp expect actual &&
- + rm trace2.txt &&
- + git sparse-checkout disable
+ 1: 5a2ed3d1d701 = 1: 5a2ed3d1d701 sparse-index: skip indexes with unmerged entries
+ 2: 8aa41e749471 = 2: 8aa41e749471 sparse-index: include EXTENDED flag when expanding
+ 3: 70971b1f9261 = 3: 70971b1f9261 t1092: expand repository data shape
+ 4: a80b5a41153f = 4: a80b5a41153f t1092: add tests for status/add and sparse files
+ 5: 07a45b661c4a = 5: 07a45b661c4a unpack-trees: preserve cache_bottom
+ 6: cc4a526e7947 = 6: cc4a526e7947 unpack-trees: compare sparse directories correctly
+ 7: 598375d3531f < -: ------------ unpack-trees: stop recursing into sparse directories
+ -: ------------ > 7: e28df7f9395d unpack-trees: be careful around sparse directory entries
+ 8: 47da2b317237 = 8: 2cc3a93d4434 dir.c: accept a directory as part of cone-mode patterns
+ 9: bc1512981493 = 9: 5011feb1aa04 status: skip sparse-checkout percentage with sparse-index
+ 10: 5b1ae369a7cd = 10: 9f2ce5301dc9 status: use sparse-index throughout
+ 11: 3b42783d4a86 = 11: 24417e095243 wt-status: expand added sparse directory entries
+ 12: b72507f514d1 = 12: 584d4b559a91 fsmonitor: integrate with sparse index
--
gitgitgadget