Re: [PATCH 5/6] merge-ort: fix incorrect file handling
From: Elijah Newren <hidden>
Date: 2025-08-05 18:34:23
On Mon, Aug 4, 2025 at 9:39 PM Patrick Steinhardt [off-list ref] wrote:
On Mon, Aug 04, 2025 at 03:08:50PM -0700, Elijah Newren wrote:quoted
On Fri, Aug 1, 2025 at 1:31 AM Patrick Steinhardt [off-list ref] wrote:quoted
On Tue, Jul 22, 2025 at 03:23:10PM +0000, Elijah Newren via GitGitGadget wrote:quoted
Further, commit 98a1a00d5301 (t6423: add a testcase causing a failed assertion in process_renames, 2025-03-06), fixed an issue with rename-to-self but added a new testcase, 12n, that only checked for whether the merge ran to completion. A few commits ago, we modified this test to check for the number of entries in the index -- but noted that the number was wrong. And we also noted a silently-keep-instead-of-delete bug at the same time in the new testcase 12n2. Fix to merge-ort to prevent multiple bugs with rename-to-self cases: * silent deletion of file expected to be kept (t6423 testcase 12i2) * silent retention of file expected to be removed (t6423 testcase 12n2) * wrong number of extries left in the index (t6423 testcase 12n)I think it would have been nice to also go a bit more in depth for what the bug actually was and how it's fixed. You do add a comment, but that only adds a single sentence of context.Would something like this help: ...all of these issues come up because in a rename-to-self case, when we have a rename A->B, both A and B name the same file. The code in process_renames() assumes A & B are different, and tries to move the higher order stages and file contents so that they are associated just with the new path, but the assumptions of A & B being different can cause A to be deleted when it's not supposed to be or mark B as resolved and kept in place when it's supposed to be deleted. Since A & B are already the same path in the rename-to-self case, we can simply skip the steps in process_renames() for such files.Yes, it would!
Great, I'll include it in the next round, which I'll send out soon.
quoted
quoted
quoted
Signed-off-by: Elijah Newren <redacted> --- merge-ort.c | 11 +++++ t/t6423-merge-rename-directories.sh | 69 +++++++++++++++++++++++++++-- 2 files changed, 77 insertions(+), 3 deletions(-)diff --git a/merge-ort.c b/merge-ort.c index 9b9d82ed10f7..feb06720c7e1 100644 --- a/merge-ort.c +++ b/merge-ort.c@@ -2873,6 +2873,17 @@ static int process_renames(struct merge_options *opt, newinfo = new_ent->value; } + /* + * Directory renames can result in rename-to-self, which we + * want to skip so we don't mark oldpath for deletion. + * + * Note that we can avoid strcmp here because of prior + * diligence in apply_directory_rename_modifications() to + * ensure we reused existing paths from opt->priv->paths. + */ + if (oldpath == newpath) + continue;Makes me wonder whether the additional brittleness is worth the saved `strcmp()` comparison. But on the other hand we do have tests now that would break if the memory allocation patterns ever changed, so that's reassuring.There's no brittleness here; one of the many optimizations in merge-ort.c was to intern *all* pathnames in struct merge_options_internal's "paths" member; any code that needs to generate/compute a filename that may be part of the merge must check if that path already exists in opt->priv->paths, and if so use that pointer instead so that all filename comparisons can be done with cheap pointer comparisons. See the big comment near the top of merge_options_internal. Nearly all such string-equality-via-pointer-equality checks were introduced about the same time, and in other functions, which makes this one kind of an outlier. I figured whoever reviewed this patch or ran across this in the code might get surprised by the pointer comparison, so I tried to add a comment to address any questions. Looks like I wasn't thorough enough (and the first paragraph of the comment pre-dated my finding other bugs this patch fixed, which makes it slightly confusing), so I'll try to see if I can improve it for v2.I think the current version is good enough -- it feels brittle to me, but I don't see a strong reason to change it.
Okay. Note that if there was a strong reason to change it, the logic to do so would mean that the pointer-comparison-instead-of-strcmp optimization used on pathnames in several other places of merge-ort would also need to change, because they are all based on the same interning-of-pathname-strings to allow that optimization to be safe. I did notice that the merge_options_internal comment on the paths member could perhaps be improved slightly; I'll include that too.