Re: Metadata for merge conflicts during rebase (to aid rustc) and potential for better user experience?
From: Phillip Wood <hidden>
Date: 2026-01-06 14:30:05
Hi Esteban On 24/12/2025 15:03, Esteban Küber wrote:
On Mon, Dec 22, 2025 at 1:56 PM D. Ben Knoble [off-list ref] wrote:quoted
On Mon, Dec 22, 2025 at 9:31 AM Esteban Küber [off-list ref] wrote:quoted
The questions I have are: - can I *avoid* `--points-at` in any way to identify what branch we're rebasing onto?According to "git help rebase", ORIG_HEAD is not reliable but @{1} should be.After talking with other members of the compiler team, people have concerns about invoking git from the compiler, as it can be a vector for unwanted behavior.
If we're talking about "git rev-parse --git-path" then that does not run any hooks or external processes. In a linked worktree or submodule then ".git" is a file rather than a directory. You will need to read the file (which looks like "gitdir: <path>\n" to find the path to the directory.
I would agree with that assessment, so I am trying to settle on a mechanism where I can parse git state myself (on a best-effort basis; this is only for diagnostics, so fully featured support for all environments is not necessary).quoted
quoted
- is there already a better way to identify if the rebase was triggered by `git rebase` or `git pull` (configured to rebase)?I haven't studied the internals on this yet, but I think the common pattern is to look at REBASE_HEAD vs. MERGE_HEAD.Thank you for the additional information! That prompted me to look into the rest of the files once more, which gave me some hacky ideas on how to get the data I want, and this indeed seems to be sufficient to differentiate these two.quoted
quoted
- if neither of the above has a "yes" answer, would git consider *adding* that information, both for third-parties as well as to extend its own UI?I think "git status" already shows some of this (maybe not the branches in question, but certainly the "it looks like you're in the middle of a rebase/merge/cherry-pick/etc.").I looked around again and arrived to the following conclusions: - presence of .git/rebase-merge (and its files) is enough to differentiate between a rebase and a merge
Being pedantic the presence of ".git/rebase-merge" tells us that a rebase is in progress, it does not guarantee that the conflicts were created by the rebase though as it is possible for the user to run "git merge", "git cherry-pick" or "git revert" during a rebase. When a commit is being split it is possible that the conflicts come from "git stash pop" if the user stashes some changes, edits a file, commits and then pops the stashed changes.
- .git/rebase-merge/head-name is enough to identify one of the sections
Yes, that will give you the name of the branch being rebased.
- identifying *at least* one of the sections is enough to make the
output clear enough (even if ideally you'd identify both)
- the sha in FETCH_HEAD matching .git/rebase-merge/onto is enough
to identify that we're dealing with a `git rebase --rebase`
Note that FETCH_HEAD stays around until it is overwritten by the next
fetch so that if I run
git pull --rebase
followed by one of
git rebase --autosquash [--keep-base]
git rebase -i [--keep-base]
without running "git fetch" then ".git/rebase-merge/onto" will match
FETCH_HEAD but I'm not running "git pull" and I'm not rebasing onto a
new base so any conflicts come from re-arranging the existing commits,
not from changes in the upstream branch.
I think the most sensible way of solving this is for "git rebase" to
start writing a description of the "onto" commit to
".git/rebase-merge/onto-desc". That would allow the output of "git
status" to include the branch or tag that we're rebasing onto as well.
I've got a rough patch that creates that file in common cases. If the
base of the branch is not being changed the file contains "same base"
[1], if "onto" matches the upstream branch it contains "upstream <ref>"
where <ref> is the full ref of the upstream branch. If the argument
given to "--onto" is a ref then the file contains the full name of the
ref [2]. Finally when rebasing onto a new root commit it contains "new
root".
[1] Detecting that in the general case involves a revision walk which
I'd like to avoid so it only works in common cases like
git rebase -i HEAD~<n>
git rebase --keep-base --autostash
git rebase -i --onto ...@{u}
[2] If "--onto" is omitted then it defaults to "<upstream>" so if the
user runs "git rebase some-branch" the file will contain
"refs/heads/some-branch". Unfortunately "git pull --rebase" passes
object id's rather than refnames when it run "git rebase" so the
branch name is only detected when rebasing onto the upstream branch.
I'll try and post a patch next week.
- there's information that is only present in MERGE_MSG in
free-form text, that isn't present anywhere elseI assume that's the name of the branch we're merging into HEAD. For squash merges the equivalent file is SQUASH_MSG.
- I can extract the "missing" information for either the
identifying information of where we are merging, be it because of
a `git pull --no-rebase` or `git merge`; the only issue I see is
in having to rely that the output will not change from either of
"Merge branch 'main' into branch-name" and
"Merge branch 'main' of example.url:user/repo" (how much trouble
am I inviting if I were to try and rely on this text not changing
so that I can get 'main' and the remote url from here?)I'd be surprised if the messages changed but I don't think anyone is going to pledge that they'll never change. You read the object id out of MERGE_HEAD (that is always a file even if the repository is using the reftable backend) and use "git for-each-ref --points-at" to find the branch name.
First, the information present in MERGE_MSG should be available in a more structured format, to allow for tools to deal with git state in a less coupled way. (This might not be worth it, and the textual representation is already "stable enough" to rely on.)
That might be useful for "git status" as we could say which branch was being merged.
Secondly, and perhaps more importantly, when generating the diff markers that end up in the user files, their description includes only the full sha or HEAD, or the short-sha and the commit message. I would propose that the branch be identified as well in the generated code. This could look something like: `git rebase`: <<<<<<< HEAD [branch 'main']
In the general case HEAD isn't really the branch 'main', it is main plus whatever commits we've already applied. I think I saw someone suggest [from 'main'] which might be better
=======quoted
quoted
quoted
quoted
quoted
quoted
quoted
e644375 (commit message) [branch 'name']
Unless we're applying the last commit from the branch this isn't branch 'name' but one of the commit from it.
`git merge`: <<<<<<< HEAD [branch 'name'] ======= ------- between this marker and `>>>>>>>` is the code from branch 'master'
I'm skeptical that we want to inject extra text into the conflicted region. It makes sense for rustc's diagnostics but it makes it harder to resolve the conflict if we inject them into the file.
println!("Hello, main!");quoted
quoted
quoted
quoted
quoted
quoted
quoted
[branch 'main']
For merges [branch '<name>'] definitely makes sense for the two merge heads, I'm not sure what we'd do for the merge base though.
`git pull --rebase`: <<<<<<< HEAD [local branch 'main']
Do we really need a different label when pulling?
=======quoted
quoted
quoted
quoted
quoted
quoted
quoted
8191e7e4f9f82be45bdd4e71c37d2adcf4f88aa2 [branch 'main' of example.tld:user/repo]
Ideally we'd use the remote tracking branch here when pulling from a configured remote repository rather than giving the name of the branch on the remote and it's url.
`git pull --no-rebase`: <<<<<<< HEAD [branch 'main' of example.tld:user/repo] =======quoted
quoted
quoted
quoted
quoted
quoted
quoted
ebbeec7 (commit message) [local branch 'main']The format doesn't have to match the above exactly, but having the commit *and branch* information will make it much easier for people to identify things at a glance, at the cost of some additional verbosity in the generated code. The source of the issue is that where "our" and "their" code is in the patch depends on a somewhat "arbitrary" distinction (as far as a non-implementer is concerned) and it *swaps places* depending on whether we are rebasing or merging. Adding some context to the resulting patches would go a long way of mitigating the confusion this causes.
I agree having some indication of which branch each side comes would be useful but I think when rebasing it needs to be clear that the branch does not necessarily point to that particular commit. Thanks Phillip
Happy holidays, Esteban Küber