Re: 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3
From: Jeff King <hidden>
Date: 2017-04-27 20:42:27
On Thu, Apr 27, 2017 at 04:09:56PM -0400, Jeff King wrote:
On Thu, Apr 27, 2017 at 12:36:54PM -0400, Robert Stryker wrote:quoted
The problem: the script takes 30 minutes for one environment including git 2.7.4, and generates a repo of about 30mb. When run by a coworker using git 2.9.3, it takes 22+ hours and generates a 10gb repo. Clearly something here is very wrong. Either there's a pretty horrible regression or my idea is a pretty bad one ;)The large size makes me think that you're getting an auto-gc in the middle that is exploding the unreachable objects into loose storage. This can happen when objects are ready to be pruned, but Git holds on to them for a grace periods (2 weeks by default) as a precaution against simultaneous use. Try doing: git config gc.auto 0 in the repositories before the slow step. Or alternatively, try: git config gc.pruneExpire now which will continue to do the auto-gc, but throw away unreachable objects immediately. Or alternatively, we're failing to run gc at all and just getting tons of loose objects that need packed. What does running "git gc --auto" say if you run it in the slow repository? Does it improve the disk space problem?
Fiddling with your script a bit, I have a suspect. Between your two
versions of git, we started disallowing merge of unrelated histories by
default[1]. Which is exactly what your script is doing:
echo "Merge in the four rewritten projects, with generic commit messages"
git pull --no-edit webtools.common.fproj
git pull --no-edit webtools.common
git pull --no-edit webtools.common.tests
git pull --no-edit webtools.common.snippets
If you run under "set -e", or just put "|| exit 1" after those, you'll
see that they fail with v2.9.3 and newer.
So what I think is happening is that we never create that shared
history, and then your per-tag work is building further on a nonsense
fake history. That has two implications:
- as the divergent history in the shared repo gets bigger and bigger,
the fetches have to do more and more work to try to find a common
ancestor (but of course they'll never find one, because the two
histories aren't related)
- the divergent history racks up tons of unreachable objects, which
auto-gc won't pack. After a while of the script running, you can see
that auto-gc fails with "There are too many unreachable loose
objects" after the pack. Due to the way background gc works these
days, that blocks further auto-gc from running until the situation
is resolved. And you just rack up tons of loose objects, which
explains the disk usage.
Try adding "--allow-unrelated-histories" to your git-pull invocation.
-Peff
[1] See e379fdf34 (merge: refuse to create too cool a merge by default, 2016-03-18)