[RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-10-04 21:42:15
Subsystem:
documentation, the rest · Maintainers:
Jonathan Corbet, Linus Torvalds
On Wed, Oct 03 2018, Ævar Arnfjörð Bjarmason wrote:
Don't have time to patch this now, but thought I'd send a note / RFC about this. Now that we have the commit graph it's nice to be able to set e.g. core.commitGraph=true & gc.writeCommitGraph=true in ~/.gitconfig or /etc/gitconfig to apply them to all repos. But when I clone e.g. linux.git stuff like 'tag --contains' will be slow until whenever my first "gc" kicks in, which may be quite some time if I'm just using it passively. So we should make "git gc --auto" be run on clone, and change the need_to_gc() / cmd_gc() behavior so that we detect that the gc.writeCommitGraph=true setting is on, but we have no commit graph, and then just generate that without doing a full repack. As an aside such more granular "gc" would be nice for e.g. pack-refs too. It's possible for us to just have one pack, but to have 100k loose refs. It might also be good to have some gc.autoDetachOnClone option and have it false by default, so we don't have a race condition where "clone linux && git -C linux tag --contains" is slow because the graph hasn't been generated yet, and generating the graph initially doesn't take that long compared to the time to clone a large repo (and on a small one it won't matter either way). I was going to say "also for midx", but of course after clone we have just one pack, so I can't imagine us needing this. But I can see us having other such optional side-indexes in the future generated by gc, and they'd also benefit from this.
I don't have time to polish this up for submission now, but here's a WIP patch that implements this, highlights: * There's a gc.clone.autoDetach=false default setting which overrides gc.autoDetach if 'git gc --auto' is run via git-clone (we just pass a --cloning option to indicate this). * A clone of say git.git with gc.writeCommitGraph=true looks like: [...] Receiving objects: 100% (255262/255262), 100.49 MiB | 17.78 MiB/s, done. Resolving deltas: 100% (188947/188947), done. Computing commit graph generation numbers: 100% (55210/55210), done. * The 'git gc --auto' command also knows to (only) run the commit-graph (and space is left for future optimization steps) if general GC isn't needed, but we need "optimization": $ rm .git/objects/info/commit-graph; ~/g/git/git --exec-path=$PWD -c gc.writeCommitGraph=true -c gc.autoDetach=false gc --auto; Annotating commits in commit graph: 341229, done. Computing commit graph generation numbers: 100% (165969/165969), done. $ * The patch to gc.c looks less scary with -w, most of it is indenting the existing pack-refs etc. with a "!auto_gc || should_gc" condition. * I added a commit_graph_exists() exists function and only care if I get ENOENT for the purposes of this gc mode. This would need to be tweaked for the incremental mode Derrick talks about, but if we just set "should_optimize" that'll also work as far as gc --auto is concerned (e.g. on fetch, am etc.)
diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1546833213..5759fbb067 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt@@ -1621,7 +1621,19 @@ gc.autoPackLimit:: gc.autoDetach:: Make `git gc --auto` return immediately and run in background - if the system supports it. Default is true. + if the system supports it. Default is true. Overridden by + `gc.clone.autoDetach` when running linkgit:git-clone[1]. + +gc.clone.autoDetach:: + Make `git gc --auto` return immediately and run in background + if the system supports it when run via + linkgit:git-clone[1]. Default is false. ++ +The reason this defaults to false is because the only time we'll have +work to do after a 'git clone' is if something like +`gc.writeCommitGraph` is true, in that case we'd like to compute the +optimized file before returning, so that say commands that benefit +from commit graph aren't slow until it's generated in the background. gc.bigPackThreshold:: If non-zero, all packs larger than this limit are kept when
diff --git a/builtin/clone.c b/builtin/clone.c
index 15b142d646..824c130ba5 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c@@ -897,6 +897,8 @@ int cmd_clone(int argc, const char **argv, const char *prefix) struct remote *remote; int err = 0, complete_refs_before_fetch = 1; int submodule_progress; + const char *argv_gc_auto[] = {"gc", "--auto", "--cloning", NULL}; + const char *argv_gc_auto_quiet[] = {"gc", "--auto", "--cloning", "--quiet", NULL}; struct refspec rs = REFSPEC_INIT_FETCH; struct argv_array ref_prefixes = ARGV_ARRAY_INIT;
@@ -1245,5 +1247,11 @@ int cmd_clone(int argc, const char **argv, const char *prefix) refspec_clear(&rs); argv_array_clear(&ref_prefixes); + + if (0 <= option_verbosity) + run_command_v_opt_cd_env(argv_gc_auto, RUN_GIT_CMD, git_dir, NULL); + else + run_command_v_opt_cd_env(argv_gc_auto_quiet, RUN_GIT_CMD, git_dir, NULL); + return err; }
diff --git a/builtin/gc.c b/builtin/gc.c
index 6591ddbe83..27be03890a 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c@@ -43,6 +43,7 @@ static int gc_auto_threshold = 6700; static int gc_auto_pack_limit = 50; static int gc_write_commit_graph; static int detach_auto = 1; +static int detach_clone_auto = 0; static timestamp_t gc_log_expire_time; static const char *gc_log_expire = "1.day.ago"; static const char *prune_expire = "2.weeks.ago";
@@ -133,6 +134,7 @@ static void gc_config(void) git_config_get_int("gc.autopacklimit", &gc_auto_pack_limit); git_config_get_bool("gc.writecommitgraph", &gc_write_commit_graph); git_config_get_bool("gc.autodetach", &detach_auto); + git_config_get_bool("gc.clone.autodetach", &detach_clone_auto); git_config_get_expiry("gc.pruneexpire", &prune_expire); git_config_get_expiry("gc.worktreepruneexpire", &prune_worktrees_expire); git_config_get_expiry("gc.logexpiry", &gc_log_expire);
@@ -157,9 +159,6 @@ static int too_many_loose_objects(void) int num_loose = 0; int needed = 0; - if (gc_auto_threshold <= 0) - return 0; - dir = opendir(git_path("objects/17")); if (!dir) return 0;
@@ -369,10 +368,21 @@ static int need_to_gc(void) return 0; if (run_hook_le(NULL, "pre-auto-gc", NULL)) - return 0; + return -1; return 1; } +static int need_to_optimize(void) { + if (gc_write_commit_graph) { + char *obj_dir = get_object_directory(); + char *graph_name = get_commit_graph_filename(obj_dir); + + if (commit_graph_exists(graph_name) == 0) /* ENOENT */ + return 1; + } + return 0; +} + /* return NULL on success, else hostname running the gc */ static const char *lock_repo_for_gc(int force, pid_t* ret_pid) {
@@ -491,6 +501,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) { int aggressive = 0; int auto_gc = 0; + int cloning = 0; int quiet = 0; int force = 0; const char *name;
@@ -498,6 +509,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) int daemonized = 0; int keep_base_pack = -1; timestamp_t dummy; + int should_gc; + int should_optimize; struct option builtin_gc_options[] = { OPT__QUIET(&quiet, N_("suppress progress reporting")),
@@ -507,6 +520,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")), OPT_BOOL_F(0, "auto", &auto_gc, N_("enable auto-gc mode"), PARSE_OPT_NOCOMPLETE), + OPT_BOOL_F(0, "cloning", &cloning, N_("enable cloning mode"), + PARSE_OPT_NOCOMPLETE), OPT_BOOL_F(0, "force", &force, N_("force running gc even if there may be another gc running"), PARSE_OPT_NOCOMPLETE),
@@ -555,22 +570,27 @@ int cmd_gc(int argc, const char **argv, const char *prefix) /* * Auto-gc should be least intrusive as possible. */ - if (!need_to_gc()) + should_gc = need_to_gc(); + if (should_gc == -1) + return 0; + should_optimize = need_to_optimize(); + if (!should_gc && !should_optimize) return 0; - if (!quiet) { + if (!quiet && should_gc) { if (detach_auto) fprintf(stderr, _("Auto packing the repository in background for optimum performance.\n")); else fprintf(stderr, _("Auto packing the repository for optimum performance.\n")); fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); } - if (detach_auto) { + if (detach_auto && + (!cloning || (cloning && detach_clone_auto))) { if (report_last_gc_error()) return -1; if (lock_repo_for_gc(force, &pid)) return 0; - if (gc_before_repack()) + if (should_gc && gc_before_repack()) return -1; delete_tempfile(&pidfile);
@@ -611,45 +631,48 @@ int cmd_gc(int argc, const char **argv, const char *prefix) atexit(process_log_file_at_exit); } - if (gc_before_repack()) - return -1; - - if (!repository_format_precious_objects) { - close_all_packs(the_repository->objects); - if (run_command_v_opt(repack.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, repack.argv[0]); - - if (prune_expire) { - argv_array_push(&prune, prune_expire); - if (quiet) - argv_array_push(&prune, "--no-progress"); - if (repository_format_partial_clone) - argv_array_push(&prune, - "--exclude-promisor-objects"); - if (run_command_v_opt(prune.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, prune.argv[0]); + if (!auto_gc || should_gc) { + if (gc_before_repack()) + return -1; + + if (!repository_format_precious_objects) { + close_all_packs(the_repository->objects); + if (run_command_v_opt(repack.argv, RUN_GIT_CMD)) + return error(FAILED_RUN, repack.argv[0]); + + if (prune_expire) { + argv_array_push(&prune, prune_expire); + if (quiet) + argv_array_push(&prune, "--no-progress"); + if (repository_format_partial_clone) + argv_array_push(&prune, + "--exclude-promisor-objects"); + if (run_command_v_opt(prune.argv, RUN_GIT_CMD)) + return error(FAILED_RUN, prune.argv[0]); + } } - } - if (prune_worktrees_expire) { - argv_array_push(&prune_worktrees, prune_worktrees_expire); - if (run_command_v_opt(prune_worktrees.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, prune_worktrees.argv[0]); - } - if (run_command_v_opt(rerere.argv, RUN_GIT_CMD)) - return error(FAILED_RUN, rerere.argv[0]); + if (prune_worktrees_expire) { + argv_array_push(&prune_worktrees, prune_worktrees_expire); + if (run_command_v_opt(prune_worktrees.argv, RUN_GIT_CMD)) + return error(FAILED_RUN, prune_worktrees.argv[0]); + } - report_garbage = report_pack_garbage; - reprepare_packed_git(the_repository); - if (pack_garbage.nr > 0) - clean_pack_garbage(); + if (run_command_v_opt(rerere.argv, RUN_GIT_CMD)) + return error(FAILED_RUN, rerere.argv[0]); + + report_garbage = report_pack_garbage; + reprepare_packed_git(the_repository); + if (pack_garbage.nr > 0) + clean_pack_garbage(); + } if (gc_write_commit_graph) write_commit_graph_reachable(get_object_directory(), 0, !quiet && !daemonized); - if (auto_gc && too_many_loose_objects()) + if (auto_gc && should_gc && too_many_loose_objects()) warning(_("There are too many unreachable loose objects; " "run 'git prune' to remove them."));
diff --git a/commit-graph.c b/commit-graph.c
index 5908bd4e34..a4a7c94cec 100644
--- a/commit-graph.c
+++ b/commit-graph.c@@ -57,6 +57,18 @@ static struct commit_graph *alloc_commit_graph(void) return g; } +int commit_graph_exists(const char *graph_file) +{ + struct stat st; + if (stat(graph_file, &st)) { + if (errno == ENOENT) + return 0; + else + return -1; + } + return 1; +} + struct commit_graph *load_commit_graph_one(const char *graph_file) { void *graph_map;
diff --git a/commit-graph.h b/commit-graph.h
index 5678a8f4ca..a251f1bc32 100644
--- a/commit-graph.h
+++ b/commit-graph.h@@ -11,6 +11,7 @@ struct commit; char *get_commit_graph_filename(const char *obj_dir); +int commit_graph_exists(const char *graph_file); /* * Given a commit struct, try to fill the commit struct info, including: