Thread (3 messages) 3 messages, 2 authors, 2017-04-27

Re: 30min Script in git 2.7.4 takes 22+ hrs in git 2.9.3

From: Jeff King <hidden>
Date: 2017-04-27 20:42:27

On Thu, Apr 27, 2017 at 04:09:56PM -0400, Jeff King wrote:
On Thu, Apr 27, 2017 at 12:36:54PM -0400, Robert Stryker wrote:
quoted
The problem:  the script takes 30 minutes for one environment
including git 2.7.4, and generates a repo of about 30mb.   When run by
a coworker using git 2.9.3, it takes 22+ hours and generates a 10gb
repo.

Clearly something here is very wrong. Either there's a pretty horrible
regression or my idea is a pretty bad one ;)
The large size makes me think that you're getting an auto-gc in the
middle that is exploding the unreachable objects into loose storage.
This can happen when objects are ready to be pruned, but Git holds on to
them for a grace periods (2 weeks by default) as a precaution against
simultaneous use.

Try doing:

  git config gc.auto 0

in the repositories before the slow step. Or alternatively, try:

  git config gc.pruneExpire now

which will continue to do the auto-gc, but throw away unreachable
objects immediately.

Or alternatively, we're failing to run gc at all and just getting tons
of loose objects that need packed. What does running "git gc --auto" say
if you run it in the slow repository? Does it improve the disk space
problem?
Fiddling with your script a bit, I have a suspect. Between your two
versions of git, we started disallowing merge of unrelated histories by
default[1]. Which is exactly what your script is doing:

  echo "Merge in the four rewritten projects, with generic commit messages"
  git pull --no-edit webtools.common.fproj     
  git pull --no-edit webtools.common           
  git pull --no-edit webtools.common.tests     
  git pull --no-edit webtools.common.snippets

If you run under "set -e", or just put "|| exit 1" after those, you'll
see that they fail with v2.9.3 and newer.

So what I think is happening is that we never create that shared
history, and then your per-tag work is building further on a nonsense
fake history. That has two implications:

  - as the divergent history in the shared repo gets bigger and bigger,
    the fetches have to do more and more work to try to find a common
    ancestor (but of course they'll never find one, because the two
    histories aren't related)

  - the divergent history racks up tons of unreachable objects, which
    auto-gc won't pack. After a while of the script running, you can see
    that auto-gc fails with "There are too many unreachable loose
    objects" after the pack. Due to the way background gc works these
    days, that blocks further auto-gc from running until the situation
    is resolved. And you just rack up tons of loose objects, which
    explains the disk usage.

Try adding "--allow-unrelated-histories" to your git-pull invocation.

-Peff

[1] See e379fdf34 (merge: refuse to create too cool a merge by default, 2016-03-18)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help