Re: [PATCH] contrib/subtree: fix split with squashed subtrees
From: Phillip Wood <hidden>
Date: 2025-09-02 13:22:20
On 01/09/2025 21:43, Colin Stagner wrote:
On 9/1/25 08:54, Phillip Wood wrote: Colin Stagner [off-list ref] writes:quoted
quoted
- if test -n "$(git log -1 --grep="git-subtree-dir:" $rev)" + if test -n "$(git log -1 --grep="git-subtree-dir:" "$rev^!")"We could drop the "-1" as we're only considering a single commit.Concur.quoted
quoted
- if test -z "$(git log -1 --grep="git-subtree-mainline:" $rev)" && - test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" $rev)" + if test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev^!")" && + test -z "$(git log -1 --grep="git-subtree-dir: $arg_prefix$" "$rev^!")"I'm less sure about this change. Is the second test checking making sure we don't prune this commit if it has an ancestor that is a subtree merge for the subtree we're interested in?The outer loop in git-subtree.sh:983 appears to iterate from the root commit forwards… and not from the HEAD backwards. git rev-list --topo-order --reverse --parents $rev $unrevs # ^^^^^^^^^ Since the iteration is ancestor-first, I'm having difficulty seeing why `should_ignore_subtree_split_commit()` would want to do an ancestor traversal at all. It already sees the commits ancestor-first. But there could be a reason that I don't know.
Ah, I was only looking at this patch, not how it was called. That begs the question "what's the point of these checks if we've already visited all the ancestors anyway". I think the answer is that it is pruning the recursion that happens in check_parent() and checking the commits that come from that rev-list command is pointless. The regression test introduced with this function only looks at $extracount which comes from the recursion. I haven't looked too closely but it would be nice if we could move this check so it is only run when check_parents() is recursing.
Here is a more long-winded breakdown of these tests. From what I can determine: test -z "$(git log -1 --grep="git-subtree-mainline:" "$rev") excludes squashed commits created from git subtree merge --prefix subM --squash srcBranch
> > The --squash creates two commits:
1. A single-parent "Squashed 'subM/' content from", which squashes the changes from srcBranch. This commit's tree is like the one on srcBranch. It does not have the `subM/` prefix.
Yes, and the commit message comes from squash_msg() which adds the "git-subtree-dir:" and "git-subtree-split:" trailers but not "git-subtree-mainline:".
2. A merge commit which rewrites the tree in (1) to add the `subM/` leading directory, then merge it with the current branch. The merge commit doesn't have any `git-subtree:` trailers.
Indeed. Running git subtree split --squash --rejoin --prefix subM will create a squash commit as above and a merge with a commit message created by rejoin_msg() and contains the "git-subtree-dir:", "git-subtree-split:" and "git-subtree-mainline:" trailers.
We must exclude (1) since the trees aren't actually compatible with HEAD. (They don't have the `subM` prefix). We must keep (2). The above `test -z` appears to do this.
So we'll exclude squashed merges but not squashed splits? I think you're right that we don't want "git log" to walk the history here.
I am *much* less certain about the second test: test -z "$(git log -1 \ --grep="git-subtree-dir: $arg_prefix$" $rev)" I think this was intended to keep the mainline portion from a previous `git subtree split --rejoin`. But if I remove this `test -z`, all the unit tests still pass—including mine. There may not be any test coverage for this line. I will probably omit this `test` from v2.
I'm not very familiar with git-subtree but I thought this was ensuring that we did not exclude the ancestors of a squash or split that involves the subtree that we're interested in. I wouldn't be surprised if the test coverage was lacking.
quoted
It would be very helpful if Zach could comment on what was intended here.Yes, this would aid my understanding a lot.
and mine too.
quoted
If it turns out that all three tests only want to consider a single commit then it would be be more efficient to run a single git command and check the output with something like git show -s --format='%(trailers:key=git-subtree-dir,key=git- subtree-mainline' $rev | while read trailer do # check trailers here using case "$trailer" doneThis is a cleaner approach, and I'll explore it for v2. Any objection to long options like `--no-patch` instead of `-s`? I find these are better for scripts since there's less hunting around in man pages.
Sure, I used '-s' out of habit but '--no-patch' would be clearer (especially as I can't see any link between the short and long option names in this case) Thanks Phillip