Re: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function
From: Johannes Schindelin <hidden>
Date: 2022-02-25 16:26:21
Hi Elijah, On Tue, 22 Feb 2022, Elijah Newren wrote:
On Tue, Feb 22, 2022 at 8:54 AM Johannes Schindelin [off-list ref] wrote:quoted
On Mon, 21 Feb 2022, Johannes Schindelin wrote:quoted
[...] since `merge-tree` is a low-level tool meant to be called by programs rather than humans, we need to make sure that those messages remain machine-parseable, even if they contain file names. [...] Do you think we can switch to `sq_quote_buf_pretty()` for these messages?Or maybe much better: use NUL to separate those messages if `-z` is passed to `merge-tree`? That would address the issue in one elegant diff. What do you think?Separating the combination of messages for a single target path from the combination of messages for a different target path by a NUL character may make sense. Would we also want the messages for a path to be prepended by the pathname and a NUL character, in this case, to make it easier to determine which path the group of messages are for? I'm not sure if that does exactly what you are asking, though.
The most important thing I am asking for is a way for a program calling `merge-tree` to figure out whether it knows how to handle all the types of conflicts encountered in this merge. So that a web UI could present e.g. simple content conflicts, and even rename/rename conflicts, but would know that it cannot resolve the conflicts e.g. when a submodule is affected. So... I am fairly certain that we're not as close to addressing this as I had hoped for.
The thing that is stored (in opt->priv->output) is a strbuf per path, not an array of strbufs per path. So, if we have a rename/delete and a rename/add and a mode conflict for the same "path" (A->B on one side, other side deletes A and adds a symlink B, resulting in three messages for path "B" that are all appended into a single strbuf), then we'll have a single "message" which has three newlines. We can add a NUL character at the end of that, but not between the messages without restructuring things a bit. There's also at least one example, with submodules, where there are two path_msg() calls for the same individual conflict in order to split conflict info from resolution advice, and those really shouldn't be thought of as messages for different conflicts. (I'm starting to wonder if the resolution advice should just be tossed; I kept it because merge-recursive had it, but it might not make sense with merge-ort being used by server side merges. But even if we toss that one, I'm not sure I want to commit to one path_msg() call per "logical conflict".) But...maybe this would be good enough for some kind of use you have? Because if you only want to care about "simple" cases, you could potentially define those as ones with only one newline in them.
We cannot rely on newline character parsing because that is a valid filename character on Unix: $ echo a >'with > a newline' $ ls -la with* -rw-r--r-- 1 me me 2 Feb 25 17:10 'with'$'\n''a newline' I guess we have to work harder on this and add more than just an `strbuf` so that we can output `<path>NUL<conflict-type>NUL` pairs (where we promise to keep the `<conflict-type>` strings constant) or something similar. Ciao, Dscho