Thread (218 messages) 218 messages, 10 authors, 2022-06-18

Re: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function

From: Johannes Schindelin <hidden>
Date: 2022-02-21 11:12:18

Hi Elijah,

On Fri, 4 Feb 2022, Elijah Newren wrote:
On Fri, Feb 4, 2022 at 3:10 PM Johannes Schindelin
[off-list ref] wrote:
quoted
On Sat, 29 Jan 2022, Elijah Newren wrote:
quoted
On Sat, Jan 29, 2022 at 12:23 AM Johannes Sixt [off-list ref] wrote:
quoted
Just a heckling from the peanut gallery...

Am 29.01.22 um 07:08 schrieb Elijah Newren:
quoted
On Fri, Jan 28, 2022 at 8:55 AM Johannes Schindelin
[off-list ref] wrote:
quoted
Meaning: Even if stage 3 is missing from the first conflict and stage 1 is
missing from the second conflict, in the output we would see stages 1, 2,
2, 3, i.e. a duplicate stage 2, signifying that we're talking about two
different conflicts.
I don't understand why you're fixating on the stage here.  Why would
you want to group all the stage 2s together, count them up, and then
determine there are N conflicting files because there are N stage 2's?
Looks like you are misunderstanding Dscho's point: When you have two
conflicts, the first with stages 1 and 2, the second with stages 2 and
3, then the 2s occur lumped together when the 4 lines are printed in a
row, and that is the cue to the parser where the new conflict begins.
Dscho did not mean that all N 2s of should be listed together.
Ah, so...I didn't understand his misunderstanding?  Using stages as a
cue to the parser where the new conflict begins is broken; you should
instead check for when the filename listed on a line does not match
the filename on the previous line.
But that would break down in case of rename/rename conflicts, right?
quoted
In particular, if one conflict has stages 1 and 2, and the next conflict
has only stage 3, then looking at stages only might cause you to
accidentally lump unrelated conflicts together.
Precisely. That's why I would love to have a way to deviate from the
output of `ls-files -u`'s format, and have a reliable way to indicate
stages that belong to the same merge conflict.
Ah, attempting to somehow identify and present logical separate
conflicts?  That could be awesome, but I'm not sure it's technically
possible.  It certainly isn't with today's merge-ort.

Let me ask some questions first...

If I understand you correctly then in the event of a rename/rename,
i.e. foo->bar & foo->baz, then you want foo's, bar's, & baz's stages
all listed together.  Right?  And in some way that you can identify
them as related?
Yes, that was kind of my idea ;-)
If we do so, how do we mark the beginning and the end of what you call
"the same merge conflict"?  If you say it's always 3 stages (with the
possibility of all-zero modes/oids), then what about the rename/rename
case above modified so that the side that did foo->baz also added a
different 'bar'?  That'd be 4 non-zero modes/oids, all of them
relevant.  Or what if the side that did foo->bar also renamed
something else to 'baz', giving us even more non-zero stages for these
three paths?  Perhaps you consider these different conflicts and want
them listed separately -- if so, where does one conflict begin and
another start and which stages are parts of which conflict?
Thank you for calling me out on an only half-finished thought.

To be quite honest, I previously did not have much time to think of the
non-trivial nature of representing merge conflicts in a web UI when
performing merges with rename detection, in particular when performing
_recursive_ merges (where, as you point out so correctly, an arbitrary
number of rename conflicts can happen _for the same file_).

Of course, this gets even more complicated when we're also thinking about
type changes (file -> symlink, file -> directory, and vice versa).
If you are attempting to somehow present the stuff that "belongs to
the same merge conflict" are you also trying to identify what kind of
merge conflict it is?  If so, do you want each type of merge conflict
listed?  For example, let's switch from the example above of logically
disjoint paths coming together to result in more than 3 stages, and
instead pick an example with a single logical path with less than
three stages.  And now let's say that path has multiple conflicts
associated with it; let's use an example with 3: rename/delete +
modify/delete + directory/file (one side renames foo->bar while
modifying the contents, the other side deletes foo and adds the
directory 'bar/').  In this case, there is a target file 'bar' that
has two non-zero modes/oids in the ls-files-u output.  If all three
types of conflicts need to be listed, does each need to be listed with
the two non-zero modes/oids (and perhaps one zero mode/oid), resulting
in six listings for 'bar'?  Or would the duplication be confusing
enough that we instead decide to list some merge conflicts with no
stages associated with them?

Thinking about both sets of questions in the last two paragraphs from
a higher level -- should we focus on and group the higher order stages
by the individual conflicts that happen, or should we group them by
the paths that they happen to (which is what `ls-files -u` happens to
do), or should we not bother grouping them and instead duplicate the
higher order stages for each logical conflict it is part of?

As an alternative to duplicating higher order stages, do we sometimes
decide to "lump" separate conflicts together and treat them as one
conflict?  If so, what are the rules on how we decide to lump
conflicts and when not to?  Is there a bright line boundary?  And can
it be done without sucking in arbitrarily more stages for a single
conflict?


Some testcases that might be useful while considering the above
questions: take a look at the "rad", "rrdd", and "mod6" tests of
t6422.  How many "same merge conflicts" are there for each of those,
and what's the boundary between them?  And can you give the answer in
the form of rules that generically handle all cases, rather than just
answering these three specific cases?


I've thought about this problem long and hard before (in part because
of some conversations I had with Edward Thompson about libgit2 and
Not a big deal for _me_, but I seem to remember that Ed cared a lot about
having no p in their surname ;-)
merging at Git Merge 2020).  It wasn't at all clear to me that libgit2
had considered anything beyond simple rename cases.  The only rules I
ever figured out that made sense to me was "group the stages by target
filename rather than by logical conflict" (so we get `ls -files -u`
populated) and print a meant-for-human message for each logical
conflict (found in the <Informational Messages> section for
merge-tree), and make NO attempt to connect stages by conflict type.

I'm sure that's not what you wanted to hear, and maybe doesn't even
play nicely with your design.  But short of ignoring the edge and
corner cases, I don't see how to solve that problem.  If you do just
want to ignore edge and corner cases, then just ignore the
rename/rename case you brought up in the first place and just use
`ls-files -u`-type output as-is within your design.  If you don't want
to ignore edge cases and want something that works with a specific
design that somehow groups conflicted file stages by conflict type,
then we're going to have to dig into all these questions above and do
some big replumbing within merge-ort.
There is sometimes a big difference between what I want to hear and what I
need to hear. Thank you for providing so many details that I needed to
hear.

So let's take a step back and look at my goal here, as in: the
over-arching goal: to use merge-ort on the server side.

From what you said above, it becomes very clear to me that there is very
little chance to resolve such conflicts on the server side.

For example, if a topic branch renames a file differently than the main
branch, there is a really good chance that the user tasked with merging
the topic branch will have to do a whole lot more than just click a few
buttons to perform that task. There might very well be the need to edit
files that do not contain merge conflict markers (I like to call those
cases "non-semantic merge conflicts"), and almost certainly local testing
will be necessary.

So I guess the best we can do in those complicated cases is to give a
comprehensive overview of the problems in the web UI, with the note that
this merge conflict has to be resolved on the local side.

Which brings me to the next concern: since `merge-tree` is a low-level
tool meant to be called by programs rather than humans, we need to make
sure that those messages remain machine-parseable, even if they contain
file names.

Concretely: while I am not currently aware of any web UI that allows to
resolve simple rename/rename conflicts, it is easily conceivable how to
implement such a thing. When that happens, we will need to be able to
teach the server-side code to discern between the cases that can be
handled in the web UI (trivial merge conflicts, trivial rename/rename
conflicts) as compared to scenarios where the conflicts are just too
complex.

Here's an excerpt from t4301:

-- snip --
Auto-merging greeting
CONFLICT (content): Merge conflict in greeting
Auto-merging numbers
CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead.
CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1.  Version side1 of whatever~side1 left in tree.
-- snap --

This is the complete set of messages provided in the `test conflict
notices and such` test case.

I immediately notice that every line contains at least one file name.
Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it
does not seem as if the file names are quoted:

		path_msg(opt, path, 1, _("Auto-merging %s"), path);

(where `path` is used verbatim in a call to `merge_3way()` before that,
i.e. it must not have been quoted)

I would like to register a wish to ensure that file names with special
characters (such as most notably line-feed characters) are quoted in these
messages, so that a simple server-side parser can handle messages starting
with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and
"throw the hands up in the air" if any other message prefix is seen.

Do you think we can switch to `sq_quote_buf_pretty()` for these messages?
For the `Auto-merging` one, it would be trivial, but I fear that we will
have to work a bit on the `path_msg()` function
(https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it
accepts a variable list of arguments without any clue whether the
arguments refer to paths or not. (And I would be loathe to switch _all_
callers to do the quoting themselves.)

I see 28 calls to that function, and at least a couple that pass not only
a path but also an OID (e.g.
https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613).

We could of course be sloppy and pass even OIDs through
`sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any
special characters in them, but it gets more complicated e.g. in
https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we
pass an `strbuf` that contains a somewhat free-form commit message.

I guess we could still pass those through `sq_quote_buf_pretty()`, even if
they are not paths, to ensure that there are no special characters in the
machine-parseable lines.

What do you think?

Ciao,
Dscho
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help