Thread (65 messages) 65 messages, 7 authors, 2018-10-16

Re: We should add a "git gc --auto" after "git clone" due to commit graph

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-10-03 20:35:31

On Wed, Oct 03 2018, Jeff King wrote:
On Wed, Oct 03, 2018 at 12:08:15PM -0700, Stefan Beller wrote:
quoted
I share these concerns in a slightly more abstract way, as
I would bucket the actions into two separate bins:

One bin that throws away information.
this would include removing expired reflog entries (which
I do not think are garbage, or collection thereof), but their
usefulness is questionable.

The other bin would be actions that optimize but
do not throw away any information, repacking (without
dropping files) would be part of it, or the new
"write additional files".

Maybe we can move all actions of the second bin into a new
"git optimize" command, and git gc would do first the "throw away
things" and then the optimize action, whereas clone would only
go for the second optimizing part?
One problem with that world-view is that some of the operations do
_both_, for efficiency. E.g., repacking will drop unreachable objects in
too-old packs. We could actually be more aggressive in combining things
here. For instance, a full object graph walk in linux.git takes 30-60
seconds, depending on your CPU. But we do it at least twice during a gc:
once to repack, and then again to determine reachability for pruning.

If you generate bitmaps during the repack step, you can use them during
the prune step. But by itself, the cost of generating the bitmaps
generally outweighs the extra walk. So it's not worth generating them
_just_ for this (but is an obvious optimization for a server which would
be generating them anyway).
I don't mean to fan the flames of this obviously controversial "git gc
does optimization" topic (which I didn't suspect there would be a debate
about...), but a related thing I was wondering about the other day is
whether we could have a gc.fsck option, and in the background do fsck
while we were at it, and report this back via some facility like
gc.log[1].

That would also fall into this category of more work we could do while
we're doing a full walk anyway, but as with what you're suggesting would
require some refactoring.

1. Well, one that doesn't suck, see
   https://public-inbox.org/git/87inc89j38.fsf@evledraar.gmail.com/ /
   https://public-inbox.org/git/87d0vmck55.fsf@evledraar.gmail.com/ etc.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help