Re: [PATCH v2 6/7] subtree: more robustly distinguish subtree and mainline commits
From: Johannes Schindelin <hidden>
Date: 2020-10-07 19:42:55
Hi, On Tue, 6 Oct 2020, Tom Clarkson via GitGitGadget wrote:
From: Tom Clarkson <redacted> Prevent a mainline commit without $dir being treated as a subtree commit and pulling in the entire mainline history. Any valid subtree commit will have only valid subtree commits as parents, which will be unchanged by check_parents.
I feel like this is only half the picture because I have a hard time stitching these two sentences together. After studying the code and your patch a bit, it appears to me that `process_split_commit()` calls `check_parents()` first, which will call `process_split_commit()` for all as yet unmapped parents. So basically, it recurses until it found a commit all of whose parents are already mapped, then permeates that information all the way back. Doesn't this cause serious issues with stack overflows and all for long commit histories?
quoted hunk ↗ jump to hunk
Signed-off-by: Tom Clarkson <redacted> --- contrib/subtree/git-subtree.sh | 24 +++++++++++------------- 1 file changed, 11 insertions(+), 13 deletions(-)diff --git a/contrib/subtree/git-subtree.sh b/contrib/subtree/git-subtree.sh index e56621a986..fa6293b372 100755 --- a/contrib/subtree/git-subtree.sh +++ b/contrib/subtree/git-subtree.sh@@ -224,8 +224,6 @@ cache_setup () { fi mkdir -p "$cachedir" || die "Can't create new cachedir: $cachedir" - mkdir -p "$cachedir/notree" || - die "Can't create new cachedir: $cachedir/notree"
It might make sense to talk about this a bit in the commit message. Essentially, you are replacing the `notree/<rev>` files by mapping `<rev>` to the empty string. This makes me wonder, again, whether the file system layout of the cache can hold up to the demands. If a main project were to merge a subtree with, say, 10 million commits, wouldn't that mean that `git subtree` would now fill one directory with 10 million files? I cannot imagine that this performs well, still.
quoted hunk ↗ jump to hunk
debug "Using cachedir: $cachedir" >&2 }@@ -255,18 +253,11 @@ check_parents () { local indent=$(($2 + 1)) for miss in $missed do - if ! test -r "$cachedir/notree/$miss" - then - debug " unprocessed parent commit: $miss ($indent)" - process_split_commit "$miss" "" "$indent" - fi + debug " unprocessed parent commit: $miss ($indent)" + process_split_commit "$miss" "" "$indent"
That makes sense to me, as the `missed` variable only contains as yet unmapped commits, therefore we do not have to have an equivalent `test -r` check. Ciao, Dscho
quoted hunk ↗ jump to hunk
done } -set_notree () { - echo "1" > "$cachedir/notree/$1" -} - cache_set () { oldrev="$1" newrev="$2"@@ -719,11 +710,18 @@ process_split_commit () { # vs. a mainline commit? Does it matter? if test -z "$tree" then - set_notree "$rev" if test -n "$newparents" then - cache_set "$rev" "$rev" + if test "$newparents" = "$parents" + then + # if all parents were subtrees, this can be a subtree commit + cache_set "$rev" "$rev" + else + # a mainline commit with tree missing is equivalent to the initial commit + cache_set "$rev" "" + fi else + # no parents with valid subtree mappings means a commit prior to subtree add cache_set "$rev" "" fi return --gitgitgadget