Re: [PATCH v2 0/7] subtree: Fix handling of complex history

[PATCH 0/7] subtree: Fix handling of complex history · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 1/7] subtree: handle multiple parents passed to cache_miss · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 2/7] subtree: exclude commits predating add from recursive processing · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 4/7] subtree: add git subtree map command · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 3/7] subtree: persist cache between split runs · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 5/7] subtree: add git subtree use and ignore commands · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 7/7] subtree: document new subtree commands · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
[PATCH 6/7] subtree: more robustly distinguish subtree and mainline commits · Tom Clarkson via GitGitGadget <hidden> · 2020-05-11
Re: [PATCH 0/7] subtree: Fix handling of complex history · Ed Maste <hidden> · 2020-10-04
Re: [PATCH 0/7] subtree: Fix handling of complex history · Johannes Schindelin <hidden> · 2020-10-05
Re: [PATCH 0/7] subtree: Fix handling of complex history · Ed Maste <hidden> · 2020-10-05
Re: [PATCH 0/7] subtree: Fix handling of complex history · Johannes Schindelin <hidden> · 2020-10-07
[PATCH v2 0/7] subtree: Fix handling of complex history · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
[PATCH v2 4/7] subtree: add git subtree map command · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
[PATCH v2 1/7] subtree: handle multiple parents passed to cache_miss · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 1/7] subtree: handle multiple parents passed to cache_miss · Ed Maste <hidden> · 2020-10-07
[PATCH v2 5/7] subtree: add git subtree use and ignore commands · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 5/7] subtree: add git subtree use and ignore commands · Johannes Schindelin <hidden> · 2020-10-07
[PATCH v2 6/7] subtree: more robustly distinguish subtree and mainline commits · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 6/7] subtree: more robustly distinguish subtree and mainline commits · Johannes Schindelin <hidden> · 2020-10-07
[PATCH v2 3/7] subtree: persist cache between split runs · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 3/7] subtree: persist cache between split runs · Johannes Schindelin <hidden> · 2020-10-07
[PATCH v2 2/7] subtree: exclude commits predating add from recursive processing · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 2/7] subtree: exclude commits predating add from recursive processing · Johannes Schindelin <hidden> · 2020-10-07
[PATCH v2 7/7] subtree: document new subtree commands · Tom Clarkson via GitGitGadget <hidden> · 2020-10-06
Re: [PATCH v2 7/7] subtree: document new subtree commands · Johannes Schindelin <hidden> · 2020-10-07
Re: [PATCH v2 0/7] subtree: Fix handling of complex history · Johannes Schindelin <hidden> · 2020-10-07

From: Johannes Schindelin <hidden>
Date: 2020-10-07 19:46:45

Hi Tom,

On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:

Fixes several issues that could occur when running subtree split on large
repos with more complex history.

 1. A merge commit could bypass the known start point of the subtree, which
    would cause the entire history to be processed recursively, leading to a
    stack overflow / segfault after reading a few hundred commits. Older
    commits are now explicitly recorded as irrelevant so that the recursive
    process can terminate on any mainline commit rather than only on subtree
    joins and initial commits.


 2. It is possible for a repo to contain subtrees that lack the metadata
    that is usually present in add/join commit messages (git-svn at least
    can produce such a structure). The new use/ignore/map commands allow the
    user to provide that information for any problematic commits.


 3. A mainline commit that does not contain the subtree folder could be
    erroneously identified as a subtree commit, which would add the entire
    mainline history to the subtree. Commits will now only be used as is if
    all their parents are already identified as subtree commits. While the
    new code can still be tripped up by unusual folder structures, the
    completely unambiguous solution turned out to involve a significant
    performance penalty, and the new ignore / use commands provide a
    workaround for that scenario.

I gave this as thorough a review as I can (which is not saying too much,
as I am not exactly familiar with `git subtree`'s inner workings).

Hopefully some of my comments and suggestions are helpful.

At some stage, especially given the problems I pointed out with the
implementation detail that is a flat directory with a potentially insane
number of files in it, I think it would make a lot of sense to go ahead
and turn this into a built-in Git command, implemented in C, and with a
more robust file system layout of its cache.

Ciao,
Dscho

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help