Thread (23 messages) 23 messages, 5 authors, 14d ago
COOLING14d
Revisions (2)
  1. v1 [diff vs current]
  2. v2 current

[PATCH v2 0/5] Duplicate entry hardening

From: Elijah Newren via GitGitGadget <hidden>
Date: 2026-06-14 06:37:29

We had some corrupt trees with duplicate entries in real world repositories,
which triggered an assertion failure in merge-ort. Further, the corrupt tree
creation in the third party tool would have been avoided had verify_cache()
correctly checked for D/F conflicts. Provide fixes for both issues,
including 3 preparatory changes for the merge-ort fix.

Changes since v1:

 * Add some more detail to the commit message of Patch 1
 * Split some code out in Patch 5 into a helper function.

Elijah Newren (5):
  merge-ort: propagate callback errors from traverse_trees_wrapper()
  merge-ort: drop unnecessary show_all_errors from collect_merge_info()
  merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
  merge-ort: abort merge when trees have duplicate entries
  cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts

 cache-tree.c                         | 64 +++++++++++++++++++----
 merge-ort.c                          | 78 ++++++++++++++++------------
 t/meson.build                        |  1 +
 t/t0093-direct-index-write.pl        | 38 ++++++++++++++
 t/t0093-verify-cache-df-gap.sh       | 59 +++++++++++++++++++++
 t/t6422-merge-rename-corner-cases.sh | 54 +++++++++++++++++++
 6 files changed, 249 insertions(+), 45 deletions(-)
 create mode 100644 t/t0093-direct-index-write.pl
 create mode 100755 t/t0093-verify-cache-df-gap.sh


base-commit: e8955061076952cc5eab0300424fc48b601fe12d
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2096%2Fnewren%2Fduplicate-entry-hardening-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2096/newren/duplicate-entry-hardening-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/2096

Range-diff vs v1:

 1:  282f906d1b ! 1:  dc0f596f31 merge-ort: propagate callback errors from traverse_trees_wrapper()
     @@ Commit message
          traverse_trees_wrapper() saves entries from a first pass through
          traverse_trees() and then replays them through the real callback
          (collect_merge_info_callback).  However, the replay loop silently
     -    discards the callback return value.  This means any error reported by
     -    the callback during replay -- including a future check for malformed
     -    trees -- would be ignored, allowing the merge to proceed with corrupt
     -    state.
     +    discards the callback return value.  This is not a deferred error;
     +    it is an ignored error.
      
     -    Capture the return value, stop the loop on negative (error) returns,
     -    and propagate the error to the caller.  Note that the callback returns
     -    a positive mask value on success, so we normalize non-negative returns
     -    to 0 for the caller.
     +    Today the only originator of a negative return in this entire call
     +    graph is traverse_trees()'s "exceeded maximum allowed tree depth"
     +    check; everything else (collect_merge_info_callback,
     +    traverse_trees_wrapper, the inner traverse_trees recursion) only
     +    relays that.  So in current Git, the visible effect of dropping the
     +    replay callback's return value is narrow but bad: a tree nested past
     +    core.maxTreeDepth has its -1 swallowed, the subtree below the limit
     +    is silently pruned, and the merge completes as if that were the
     +    correct result.
     +
     +    A later patch in this series will teach collect_merge_info_callback()
     +    to return -1 on an additional path -- detecting duplicate
     +    entries in malformed trees -- which is similarly handled today by
     +    just ignoring the problem (resulting in mostly a "last one wins" rule,
     +    though the non-last entry can mutate various state flags).
     +
     +    Capture the return value, stop the loop on negative returns, and
     +    propagate the error to the caller.  The callback returns a positive mask
     +    value on success, so normalize non-negative returns to
     +    0 for the caller.
      
          Signed-off-by: Elijah Newren [off-list ref]
      
 2:  949b5d8e3f = 2:  b4ff725a77 merge-ort: drop unnecessary show_all_errors from collect_merge_info()
 3:  d422f73e12 = 3:  673fbea13f merge-ort: free diff pairs queue in clear_or_reinit_internal_opts()
 4:  0d81c027aa = 4:  b5421244d4 merge-ort: abort merge when trees have duplicate entries
 5:  a87bbaa84f ! 5:  cf50f1aabc cache-tree: fix verify_cache() to catch non-adjacent D/F conflicts
     @@ Commit message
          Signed-off-by: Elijah Newren [off-list ref]
      
       ## cache-tree.c ##
     +@@ cache-tree.c: void cache_tree_invalidate_path(struct index_state *istate, const char *path)
     + 		istate->cache_changed |= CACHE_TREE_CHANGED;
     + }
     + 
     ++/*
     ++ * Check whether this_ce and the next entry in the index form a D/F
     ++ * conflict ("path" vs "path/file").  Returns the conflicting "path/..."
     ++ * name when one is found, or NULL otherwise.
     ++ *
     ++ * The cache is sorted, so "path/file" sorts after "path" and the
     ++ * conflict is usually visible as adjacent entries.  But other entries
     ++ * can sort between them -- e.g. "path-internal" sits between "path"
     ++ * and "path/file" because '-' (0x2D) precedes '/' (0x2F) -- so when
     ++ * the immediately following entry shares our prefix but starts with a
     ++ * character that sorts before '/', binary search for "path/" instead.
     ++ */
     ++static const char *find_df_conflict(struct index_state *istate,
     ++				    const struct cache_entry *this_ce,
     ++				    const struct cache_entry *next_ce)
     ++{
     ++	const char *this_name = this_ce->name;
     ++	const char *next_name = next_ce->name;
     ++	int this_len = ce_namelen(this_ce);
     ++	const struct cache_entry *other;
     ++	struct strbuf probe = STRBUF_INIT;
     ++	int pos;
     ++
     ++	if (this_len >= ce_namelen(next_ce) ||
     ++	    next_name[this_len] > '/' ||
     ++	    strncmp(this_name, next_name, this_len))
     ++		return NULL;
     ++
     ++	if (next_name[this_len] == '/')
     ++		return next_name;
     ++
     ++	strbuf_add(&probe, this_name, this_len);
     ++	strbuf_addch(&probe, '/');
     ++	pos = index_name_pos_sparse(istate, probe.buf, probe.len);
     ++	strbuf_release(&probe);
     ++
     ++	if (pos < 0)
     ++		pos = -pos - 1;
     ++	if (pos >= (int)istate->cache_nr)
     ++		return NULL;
     ++	other = istate->cache[pos];
     ++	if (ce_namelen(other) > this_len &&
     ++	    other->name[this_len] == '/' &&
     ++	    !strncmp(this_name, other->name, this_len))
     ++		return other->name;
     ++	return NULL;
     ++}
     ++
     + static int verify_cache(struct index_state *istate, int flags)
     + {
     + 	unsigned i, funny;
      @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
     + 	 */
     + 	funny = 0;
       	for (i = 0; i + 1 < istate->cache_nr; i++) {
     - 		/* path/file always comes after path because of the way
     - 		 * the cache is sorted.  Also path can appear only once,
     +-		/* path/file always comes after path because of the way
     +-		 * the cache is sorted.  Also path can appear only once,
      -		 * which means conflicting one would immediately follow.
     -+		 * so path/file is likely the immediately following path
     -+		 * but might be separated if there is e.g. a
     -+		 * path-internal/... file.
     - 		 */
     +-		 */
       		const struct cache_entry *this_ce = istate->cache[i];
       		const struct cache_entry *next_ce = istate->cache[i + 1];
     - 		const char *this_name = this_ce->name;
     - 		const char *next_name = next_ce->name;
     - 		int this_len = ce_namelen(this_ce);
     -+		const char *conflict_name = NULL;
     -+
     - 		if (this_len < ce_namelen(next_ce) &&
     +-		const char *this_name = this_ce->name;
     +-		const char *next_name = next_ce->name;
     +-		int this_len = ce_namelen(this_ce);
     +-		if (this_len < ce_namelen(next_ce) &&
      -		    next_name[this_len] == '/' &&
     -+		    next_name[this_len] <= '/' &&
     - 		    strncmp(this_name, next_name, this_len) == 0) {
     -+			if (next_name[this_len] == '/') {
     -+				conflict_name = next_name;
     -+			} else if (next_name[this_len] < '/') {
     -+				/*
     -+				 * The immediately next entry shares our
     -+				 * prefix but sorts before "path/" (e.g.,
     -+				 * "path-internal" between "path" and
     -+				 * "path/file", since '-' (0x2D) < '/'
     -+				 * (0x2F)).  Binary search to find where
     -+				 * "path/" would be and check for a D/F
     -+				 * conflict there.
     -+				 */
     -+				struct cache_entry *other;
     -+				struct strbuf probe = STRBUF_INIT;
     -+				int pos;
     -+
     -+				strbuf_add(&probe, this_name, this_len);
     -+				strbuf_addch(&probe, '/');
     -+				pos = index_name_pos_sparse(istate,
     -+							    probe.buf,
     -+							    probe.len);
     -+				strbuf_release(&probe);
     -+
     -+				if (pos < 0)
     -+					pos = -pos - 1;
     -+				if (pos >= (int)istate->cache_nr)
     -+					continue;
     -+				other = istate->cache[pos];
     -+				if (ce_namelen(other) > this_len &&
     -+				    other->name[this_len] == '/' &&
     -+				    !strncmp(this_name, other->name, this_len))
     -+					conflict_name = other->name;
     -+			}
     -+		}
     +-		    strncmp(this_name, next_name, this_len) == 0) {
     ++		const char *conflict_name;
      +
     ++		conflict_name = find_df_conflict(istate, this_ce, next_ce);
      +		if (conflict_name) {
       			if (10 < ++funny) {
       				fprintf(stderr, "...\n");
     @@ cache-tree.c: static int verify_cache(struct index_state *istate, int flags)
       			}
       			fprintf(stderr, "You have both %s and %s\n",
      -				this_name, next_name);
     -+				this_name, conflict_name);
     ++				this_ce->name, conflict_name);
       		}
       	}
       	if (funny)

-- 
gitgitgadget
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help