Re: [PATCH 6/6] merge-ort: fix directory rename on top of source of other rename/delete
From: Elijah Newren <hidden>
Date: 2025-08-04 22:33:27
On Fri, Aug 1, 2025 at 1:31 AM Patrick Steinhardt [off-list ref] wrote:
On Tue, Jul 22, 2025 at 03:23:11PM +0000, Elijah Newren via GitGitGadget wrote: What a massive commit message. It almost felt like a blog post rather than a commit message, but I certainly don't mind the additional context.
Yeah...it certainly is. I was spinning my wheels for a few weeks because of * the number of items needed to trigger the issue * the misleading/buggy testcases we had, now addressed earlier in this series * the number of other nearby bugs that also existed * the fact that "relevant renames" sometimes tricks you into thinking you're testing a rename testcase when you're not and when I started explaining it, I realized there was lots of assumed background needed to understand the bug that I'm not sure others on the list would have. Besides, I was worried that _I_ would forget all these details in 6 months, so I wanted it all spelled out. Thanks for being understanding of the lengthy tome this commit message became. And for reading all of it!
quoted
From: Elijah Newren <redacted> At GitHub, we've got a real-world repository that has been triggering failures of the form: git: merge-ort.c:3007: process_renames: Assertion `newinfo && !newinfo->merged.clean' failed. which comes from the line: VERIFY_CI(newinfo); Unfortunately, this one has been quite complex to unravel, and is a bit complex to explain. So, I'm going to carefully try to explain each relevant piece needed to understand the fix, then carefully build up from a simple testcase to some of the relevant testcases. == New special case we need to consider == Rename pairs in the diffcore machinery connect the source path of a rename with the destination path of a rename. Since we have rename pairs to consider on both sides of history since the merge base, merging has to consider a few special cases of possible overlap: A) two rename pairs having the same target path B) two rename pairs having the same source path C) the source path of one rename pair being the target path of a different rename pairSo basically file A get's moved somewhere else and then replaced by a different file B?
Yup.
quoted
Some of these came up often enough that we gave them names: A) a rename/rename(2to1) conflict (looks similar to an add/add conflict) B) a rename/rename(1to2) conflict, which represents the same path being renamed differently on the two sides of history C) not yet named merge-ort is well-prepared to handle cases (A) and (B), as was merge-recursive (which was merge-ort's predecessor). Case (C) was briefly considered during the years of merge-recursive maintenance, but the full extent of support it got was a few FIXME/TODO comments littered around the code highlighting some of the places that would probably need to be fixed to support it. When I wrote merge-ort I ignored case (C) entirely, since I believed that case (C) was only possible if we were to support break detection during merges. Not only had break detection never been supported by any merge algorithm, I thought break detection wasn't worth the effort to support in a merge algorithm. However, it turns out that case (C) can be triggered without break detection, if there's enough moving pieces. Before I dive into how to trigger case (C) with directory renames plus other renames, it might be helpful to use a simpler example with break detection first. And before we get to that it may help to explain some more basics of handling renames in the merge algorithm. So, let me first backup and provide a quick refresher on on each ofs/on on/on/
Thanks.
[snip]quoted
== Directory rename detection == If one side of history renames directory D/ -> E/, and the other side of history adds new files to E/, then directory rename detection noticesDid you mean to say "D/" here?
Yes, thanks.
[snip]quoted
== Testcases 8+ == Another bonus bug, found via understanding our final solution (and the failure of our first attempted solution)!s/solution/solutions/ as there are multiple attempted solutions that were discarded?
Yeah, I typed up this commit message and then found more issues, and inserted them earlier. I'll fix up the wording; thanks.
quoted
diff --git a/merge-ort.c b/merge-ort.c index feb06720c7e1..f1ecccee940b 100644 --- a/merge-ort.c +++ b/merge-ort.c@@ -2313,14 +2313,20 @@ static char *apply_dir_rename(struct strmap_entry *rename_info, return strbuf_detach(&new_path, NULL); } -static int path_in_way(struct strmap *paths, const char *path, unsigned side_mask) +static int path_in_way(struct strmap *paths, + const char *path, + unsigned side_mask, + struct diff_filepair *p) { struct merged_info *mi = strmap_get(paths, path); struct conflict_info *ci; if (!mi) return 0; INITIALIZE_CI(ci, mi); - return mi->clean || (side_mask & (ci->filemask | ci->dirmask)); + return mi->clean || (side_mask & (ci->filemask | ci->dirmask)) + // See testcases 12n, 12p, 12q for more details on this next conditionThis should use `/* */`-style comments.
Yep, will fix.
quoted
+ || ((ci->filemask & 0x01) && + strcmp(p->one->path, path));So if we have a stage 1 index entry and the path is the same due to a transitive rename we can say that the path is not in the way?
Right if A -> A, then we know that the original A and the new A do in fact have related contents and thus the old A is not in the way of the new A. I would have rather just checked for a stage 1 index entry and said that the presence of such a thing means there's a file in the way (that's what I originally did), but the rename-to-self case is special and is why the strcmp condition is there.