[PATCH v2 0/5] Duplicate entry hardening
From: Elijah Newren via GitGitGadget <hidden>
Date: 2026-06-14 06:37:29
We had some corrupt trees with duplicate entries in real world repositories, which triggered an assertion failure in merge-ort. Further, the corrupt tree creation in the third party tool would have been avoided had verify_cache() correctly checked for D/F conflicts. Provide fixes for both issues, including 3 preparatory changes for the merge-ort fix. Changes since v1: * Add some more detail to the commit message of Patch 1 * Split some code out in Patch 5 into a helper function. Elijah Newren (5): merge-ort: propagate callback errors from traverse_trees_wrapper() merge-ort: drop unnecessary show_all_errors from collect_merge_info() merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() merge-ort: abort merge when trees have duplicate entries cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts cache-tree.c | 64 +++++++++++++++++++---- merge-ort.c | 78 ++++++++++++++++------------ t/meson.build | 1 + t/t0093-direct-index-write.pl | 38 ++++++++++++++ t/t0093-verify-cache-df-gap.sh | 59 +++++++++++++++++++++ t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++ 6 files changed, 249 insertions(+), 45 deletions(-) create mode 100644 t/t0093-direct-index-write.pl create mode 100755 t/t0093-verify-cache-df-gap.sh base-commit: e8955061076952cc5eab0300424fc48b601fe12d Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/2096 Range-diff vs v1: 1: 282f906d1b ! 1: dc0f596f31 merge-ort: propagate callback errors from traverse_trees_wrapper() @@ Commit message traverse_trees_wrapper() saves entries from a first pass through traverse_trees() and then replays them through the real callback (collect_merge_info_callback). However, the replay loop silently - discards the callback return value. This means any error reported by - the callback during replay -- including a future check for malformed - trees -- would be ignored, allowing the merge to proceed with corrupt - state. + discards the callback return value. This is not a deferred error; + it is an ignored error. - Capture the return value, stop the loop on negative (error) returns, - and propagate the error to the caller. Note that the callback returns - a positive mask value on success, so we normalize non-negative returns - to 0 for the caller. + Today the only originator of a negative return in this entire call + graph is traverse_trees()'s "exceeded maximum allowed tree depth" + check; everything else (collect_merge_info_callback, + traverse_trees_wrapper, the inner traverse_trees recursion) only + relays that. So in current Git, the visible effect of dropping the + replay callback's return value is narrow but bad: a tree nested past + core.maxTreeDepth has its -1 swallowed, the subtree below the limit + is silently pruned, and the merge completes as if that were the + correct result. + + A later patch in this series will teach collect_merge_info_callback() + to return -1 on an additional path -- detecting duplicate + entries in malformed trees -- which is similarly handled today by + just ignoring the problem (resulting in mostly a "last one wins" rule, + though the non-last entry can mutate various state flags). + + Capture the return value, stop the loop on negative returns, and + propagate the error to the caller. The callback returns a positive mask + value on success, so normalize non-negative returns to + 0 for the caller. Signed-off-by: Elijah Newren [off-list ref] 2: 949b5d8e3f = 2: b4ff725a77 merge-ort: drop unnecessary show_all_errors from collect_merge_info() 3: d422f73e12 = 3: 673fbea13f merge-ort: free diff pairs queue in clear_or_reinit_internal_opts() 4: 0d81c027aa = 4: b5421244d4 merge-ort: abort merge when trees have duplicate entries 5: a87bbaa84f ! 5: cf50f1aabc cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts @@ Commit message Signed-off-by: Elijah Newren [off-list ref] ## cache-tree.c ## +@@ cache-tree.c: void cache_tree_invalidate_path(struct index_state *istate, const char *path) + istate->cache_changed |= CACHE_TREE_CHANGED; + } + ++/* ++ * Check whether this_ce and the next entry in the index form a D/F ++ * conflict ("path" vs "path/file"). Returns the conflicting "path/..." ++ * name when one is found, or NULL otherwise. ++ * ++ * The cache is sorted, so "path/file" sorts after "path" and the ++ * conflict is usually visible as adjacent entries. But other entries ++ * can sort between them -- e.g. "path-internal" sits between "path" ++ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when ++ * the immediately following entry shares our prefix but starts with a ++ * character that sorts before '/', binary search for "path/" instead. ++ */ ++static const char *find_df_conflict(struct index_state *istate, ++ const struct cache_entry *this_ce, ++ const struct cache_entry *next_ce) ++{ ++ const char *this_name = this_ce->name; ++ const char *next_name = next_ce->name; ++ int this_len = ce_namelen(this_ce); ++ const struct cache_entry *other; ++ struct strbuf probe = STRBUF_INIT; ++ int pos; ++ ++ if (this_len >= ce_namelen(next_ce) || ++ next_name[this_len] > '/' || ++ strncmp(this_name, next_name, this_len)) ++ return NULL; ++ ++ if (next_name[this_len] == '/') ++ return next_name; ++ ++ strbuf_add(&probe, this_name, this_len); ++ strbuf_addch(&probe, '/'); ++ pos = index_name_pos_sparse(istate, probe.buf, probe.len); ++ strbuf_release(&probe); ++ ++ if (pos < 0) ++ pos = -pos - 1; ++ if (pos >= (int)istate->cache_nr) ++ return NULL; ++ other = istate->cache[pos]; ++ if (ce_namelen(other) > this_len && ++ other->name[this_len] == '/' && ++ !strncmp(this_name, other->name, this_len)) ++ return other->name; ++ return NULL; ++} ++ + static int verify_cache(struct index_state *istate, int flags) + { + unsigned i, funny; @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags) + */ + funny = 0; for (i = 0; i + 1 < istate->cache_nr; i++) { - /* path/file always comes after path because of the way - * the cache is sorted. Also path can appear only once, +- /* path/file always comes after path because of the way +- * the cache is sorted. Also path can appear only once, - * which means conflicting one would immediately follow. -+ * so path/file is likely the immediately following path -+ * but might be separated if there is e.g. a -+ * path-internal/... file. - */ +- */ const struct cache_entry *this_ce = istate->cache[i]; const struct cache_entry *next_ce = istate->cache[i + 1]; - const char *this_name = this_ce->name; - const char *next_name = next_ce->name; - int this_len = ce_namelen(this_ce); -+ const char *conflict_name = NULL; -+ - if (this_len < ce_namelen(next_ce) && +- const char *this_name = this_ce->name; +- const char *next_name = next_ce->name; +- int this_len = ce_namelen(this_ce); +- if (this_len < ce_namelen(next_ce) && - next_name[this_len] == '/' && -+ next_name[this_len] <= '/' && - strncmp(this_name, next_name, this_len) == 0) { -+ if (next_name[this_len] == '/') { -+ conflict_name = next_name; -+ } else if (next_name[this_len] < '/') { -+ /* -+ * The immediately next entry shares our -+ * prefix but sorts before "path/" (e.g., -+ * "path-internal" between "path" and -+ * "path/file", since '-' (0x2D) < '/' -+ * (0x2F)). Binary search to find where -+ * "path/" would be and check for a D/F -+ * conflict there. -+ */ -+ struct cache_entry *other; -+ struct strbuf probe = STRBUF_INIT; -+ int pos; -+ -+ strbuf_add(&probe, this_name, this_len); -+ strbuf_addch(&probe, '/'); -+ pos = index_name_pos_sparse(istate, -+ probe.buf, -+ probe.len); -+ strbuf_release(&probe); -+ -+ if (pos < 0) -+ pos = -pos - 1; -+ if (pos >= (int)istate->cache_nr) -+ continue; -+ other = istate->cache[pos]; -+ if (ce_namelen(other) > this_len && -+ other->name[this_len] == '/' && -+ !strncmp(this_name, other->name, this_len)) -+ conflict_name = other->name; -+ } -+ } +- strncmp(this_name, next_name, this_len) == 0) { ++ const char *conflict_name; + ++ conflict_name = find_df_conflict(istate, this_ce, next_ce); + if (conflict_name) { if (10 < ++funny) { fprintf(stderr, "...\n"); @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags) } fprintf(stderr, "You have both %s and %s\n", - this_name, next_name); -+ this_name, conflict_name); ++ this_ce->name, conflict_name); } } if (funny) -- gitgitgadget