Re: [RFC PATCH 0/2] Introduce new merge-tree-ort command
From: Elijah Newren <hidden>
Date: 2022-01-05 16:54:03
On Wed, Jan 5, 2022 at 8:33 AM Christian Couder [off-list ref] wrote:
During the 2nd Virtual Git Contributors’ Summit last October, and even
before, the subject of performing server side merges and rebases came
up, as platforms like GitHub and GitLab would like to support many
features and data formats that libgit2 doesn't support, like for
example SHA256 hashes and partial clone.
It's hard for them to get rid of libgit2 though, because Git itself
doesn't have a good way to support server side merges and rebases,
while libgit2 has ways to perform them. Without server side merges and
rebases, those platforms would have to launch some kind of checkout,
which can be very expensive, before any merge or rebase.
The latest discussions on this topic following the 2nd Virtual
Summit[1] ended with some proposals around a `git merge-tree` on
steroids that could be a good solution to this issue.
The current `git merge-tree` command though seems to have a number of
issues, especially:
- it's too much related to the old merge recursive strategy which is
not the default anymore since v2.34.0 and is likely to be
deprecated over time,
- it seems to output things in its own special format, which is not
easy to customize, and which needs special code and logic to parseI agree we don't want those...but why would new merge-tree options have to use the old merge strategy or the old output format?
To move forward on this, this small RFC patch series introduces a new `git merge-tree-ort` command with the following design:
Slightly dislike the command name. `ort` was meant as a temporary, internal name. I don't think it's very meaningful to users, so I was hoping to just make `recursive` mean `ort` after we had enough testing, and to delete merge-recursive.[ch] at that time. Then `ort` merely becomes a historical footnote (...and perhaps part of the name of the file where the `recursive` algorithm is implemented).
- it uses merge-ort's API as is to perform the merge
- it gets back a tree oid and a cleanliness status from merge-ort's
API and prints them out firstGood so far.
- it uses diff's API as is to output changed paths and code
- the diff API, actually diff_tree_oid() is called 3 times: once for
the diff versus branch1 ("ours"), once for the diff versus branch2
("theirs"), and once for the diff versus the base.Why? That seems to be a performance penalty for anyone that doesn't want/need the diffs, and since we return a tree, a caller can go and get whatever diffs they like.
Therefore:
- its code is very simple and very easy to extend and customize, for
example by passing diff or merge-ort options that the code would
just pass on to the merge-ort and diff APIs respectively
- its output can easily be parsed using simple codeThese points are good.
and existing diff parsers This of course means that merge-tree-ort's output is not backward compatible with merge-tree's output, but it doesn't seem that there is much value in keeping the same output anyway. On the contrary merge-tree's output is likely to hold us back already. The first patch in the series adds the new command without any test and documentation. The second patch in the series adds a few tests that let us see how the command's output looks like in different very simple cases. Of course if this approach is considered valuable, I plan to add some documentation, more tests and very likely a number of options before submitting the next iteration.
Was there something you didn't like about https://lore.kernel.org/git/pull.1114.git.git.1640927044.gitgitgadget@gmail.com/ (local)?
I am not sure that it's worth showing the 3 diffs (versus branch1, branch2 and base) by default. Maybe by default no diff at all should be shown and the command should have --branch1 (or --ours), --branch2 (or --theirs) and --base options to ask for such output, but for an RFC patch I thought it would be better to output the 3 diffs so that people get a better idea of the approach this patch series is taking.
I think not showing, neither by default or at all would be better. All three of these are things users could easily generate for themselves with the tree we return. I'm curious, though, what's the usecase for wanting these specific diffs? Two things you didn't return that users cannot get any other way: (1) conflict and warning messages, (2) list of conflicted paths.
[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.2110211147490.56@tvgsbejvaqbjf.bet/ (local) Christian Couder (2): merge-ort: add new merge-tree-ort command merge-ort: add t/t4310-merge-tree-ort.sh .gitignore | 1 + Makefile | 1 + builtin.h | 1 + builtin/merge-tree-ort.c | 93 ++++++++++++++++++++++ git.c | 1 + t/t4310-merge-tree-ort.sh | 162 ++++++++++++++++++++++++++++++++++++++ 6 files changed, 259 insertions(+) create mode 100644 builtin/merge-tree-ort.c create mode 100755 t/t4310-merge-tree-ort.sh -- 2.34.1.433.g7bc349372a.dirty