machine-parsable git-merge-tree messages (was: [PATCH 08/12] merge-ort: provide a merge_get_conflicted_files() helper function)
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2022-02-21 14:37:45
Subsystem:
the rest · Maintainer:
Linus Torvalds
On Mon, Feb 21 2022, Johannes Schindelin wrote: [I sent out an empty reply to this earlier by mistake, sorry about that]
[...] Which brings me to the next concern: since `merge-tree` is a low-level tool meant to be called by programs rather than humans, we need to make sure that those messages remain machine-parseable, even if they contain file names. Concretely: while I am not currently aware of any web UI that allows to resolve simple rename/rename conflicts, it is easily conceivable how to implement such a thing. When that happens, we will need to be able to teach the server-side code to discern between the cases that can be handled in the web UI (trivial merge conflicts, trivial rename/rename conflicts) as compared to scenarios where the conflicts are just too complex. Here's an excerpt from t4301: -- snip -- Auto-merging greeting CONFLICT (content): Merge conflict in greeting Auto-merging numbers CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead. CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1. Version side1 of whatever~side1 left in tree. -- snap -- This is the complete set of messages provided in the `test conflict notices and such` test case. I immediately notice that every line contains at least one file name. Looking at https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1899, it does not seem as if the file names are quoted: path_msg(opt, path, 1, _("Auto-merging %s"), path); (where `path` is used verbatim in a call to `merge_3way()` before that, i.e. it must not have been quoted) I would like to register a wish to ensure that file names with special characters (such as most notably line-feed characters) are quoted in these messages, so that a simple server-side parser can handle messages starting with `Auto-merging` and with `CONFLICT (content): Merge conflict in `, and "throw the hands up in the air" if any other message prefix is seen. Do you think we can switch to `sq_quote_buf_pretty()` for these messages? For the `Auto-merging` one, it would be trivial, but I fear that we will have to work a bit on the `path_msg()` function (https://github.com/git/git/blob/v2.35.1/merge-ort.c#L630-L649) because it accepts a variable list of arguments without any clue whether the arguments refer to paths or not. (And I would be loathe to switch _all_ callers to do the quoting themselves.) I see 28 calls to that function, and at least a couple that pass not only a path but also an OID (e.g. https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1611-L1613). We could of course be sloppy and pass even OIDs through `sq_quote_buf_pretty()` in `path_msg()`, knowing that there won't be any special characters in them, but it gets more complicated e.g. in https://github.com/git/git/blob/v2.35.1/merge-ort.c#L1648-L1651, where we pass an `strbuf` that contains a somewhat free-form commit message. I guess we could still pass those through `sq_quote_buf_pretty()`, even if they are not paths, to ensure that there are no special characters in the machine-parseable lines. What do you think?
That sounds like a rather nasty hack, this is too, but demonstrates that we can pretty easily extract this in a machine-readable format with just a few lines now):
diff --git a/merge-ort.c b/merge-ort.c
index 8a5f201d190..a906881f9b3 100644
--- a/merge-ort.c
+++ b/merge-ort.c@@ -633,7 +633,7 @@ static void path_msg(struct merge_options *opt, int omittable_hint, /* skippable under --remerge-diff */ const char *fmt, ...) { - va_list ap; + va_list ap, cp; struct strbuf *sb, *dest; struct strbuf tmp = STRBUF_INIT;
@@ -650,7 +650,9 @@ static void path_msg(struct merge_options *opt, dest = (opt->record_conflict_msgs_as_headers ? &tmp : sb); + va_copy(cp, ap); va_start(ap, fmt); + if (opt->priv->call_depth) { strbuf_addchars(dest, ' ', 2); strbuf_addstr(dest, "From inner merge:");
@@ -659,6 +661,15 @@ static void path_msg(struct merge_options *opt, strbuf_vaddf(dest, fmt, ap); va_end(ap); + va_start(cp, fmt); + trace2_region_enter_printf("merge", "conflict/path", opt->repo, "%s", path); + trace2_region_leave("merge", "conflict/path", opt->repo); + trace2_region_enter_printf("merge", "conflict/fmt", opt->repo, "%s", fmt); + trace2_region_leave("merge", "conflict/fmt", opt->repo); + trace2_region_enter_printf_va("merge", "conflict/msg", opt->repo, fmt, cp); + trace2_region_leave("merge", "conflict/msg", opt->repo); + va_end(cp); + if (opt->record_conflict_msgs_as_headers) { int i_sb = 0, i_tmp = 0;
You can run that with one of the tests added in this series to get the
output as JSON, e.g.:
GIT_TRACE2_EVENT=/dev/stderr GIT_TRACE2_EVENT_NESTING=10 ~/g/git/git merge-tree --write-tree --no-messages --name-only --messages side1 side2 2>&1|jq -r .| grep '"msg"'
"msg": "whatever~side1"
"msg": "CONFLICT (file/directory): directory in the way of %s from %s; moving it to %s instead."
"msg": "CONFLICT (file/directory): directory in the way of whatever from side1; moving it to whatever~side1 instead."
"msg": "whatever~side1"
"msg": "CONFLICT (modify/delete): %s deleted in %s and modified in %s. Version %s of %s left in tree."
"msg": "CONFLICT (modify/delete): whatever~side1 deleted in side2 and modified in side1. Version side1 of whatever~side1 left in tree."
"msg": "numbers"
"msg": "Auto-merging %s"
"msg": "Auto-merging numbers"
"msg": "greeting"
"msg": "Auto-merging %s"
"msg": "Auto-merging greeting"
"msg": "greeting"
"msg": "CONFLICT (%s): Merge conflict in %s"
"msg": "CONFLICT (content): Merge conflict in greeting"
A "proper" fix for this doesn't sound too hard, we'd just instrument the
path_msg() function to pass along some "message category", see
e.g. unpack_plumbing_errors in unpack-trees.c for one example of such a
thing, or the "enum fsck_msg_id".
Then we'd just allow you to emit any of the sprintf() format itself, or
the expanded version, the path, or an id like "CONFLICT:file/directory"
or "auto-merging" etc.
I think that would be particularly useful in conjuction with the
--format changes I was proposing for this (and hacked up a WIP patch
for). You could just have a similar --format-messages or whatever.
Then you could pick \0\0 as a delimiter for your "main" --format, and
"\0" as the delimiter for your --format-messages, and thus be able to
parse N-nested \0-delimited content.