Thread (25 messages) 25 messages, 5 authors, 2024-11-05

Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db

From: Josh Steadmon <hidden>
Date: 2024-10-30 21:22:57

On 2024.10.29 14:11, Jonathan Tan wrote:
quoted hunk ↗ jump to hunk
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, detect when this occurs and print a warning. (Note that
we cannot proceed to include this object in the list of objects to
be fetched without changing at least the fetch negotiation code:
what would happen is that the client will send "want X" and "have X"
and when I tested at $DAYJOB with a work server that uses JGit, the
server reasonably returned an empty packfile. And changing the fetch
negotiation code to only use the object DB when deciding what to report
as "have" would be an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, at least a warning message will be
printed. Note that although the repo corruption we discovered was caused
by a bug in GC in a partial clone, the behavior that this patch teaches
Git to warn about applies to any repo with commit graph enabled and with
a missing commit, whether it is a partial clone or not.

Signed-off-by: Jonathan Tan <redacted>
---
 fetch-pack.c | 22 +++++++++++++++++++---
 object.h     |  2 +-
 2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/fetch-pack.c b/fetch-pack.c
index 6728a0d2f5..5a0020366b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 #define ALTERNATE	(1U << 1)
 #define COMMON		(1U << 6)
 #define REACH_SCRATCH	(1U << 7)
+#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
We're defining a new flag, and we note it in object.h as well below, so
looks good so far.

quoted hunk ↗ jump to hunk
 /*
  * After sending this many "have"s if we do not get any new ACK , we
@@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 }
 
 static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+					       int mark_additional_complete_information)
We're already marking some completion flags here, so we're just making
the parameter name more descriptive, OK.

 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
-	if (commit)
+	if (commit) {
+		if (mark_additional_complete_information)
+			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
 		return commit;
+	}
We already have a case where we're checking the commit graph, so we can
also mark the commit complete here... well, not the original "COMPLETE"
flag since we don't want to change behavior, but our new
COMPLETE_FROM_COMMIT_GRAPH flag. Sounds good.

quoted hunk ↗ jump to hunk
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 
 			if (!tag->tagged)
 				return NULL;
-			if (mark_tags_complete)
+			if (mark_additional_complete_information)
 				tag->object.flags |= COMPLETE;
 			oid = &tag->tagged->oid;
 		} else {
@@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	save_commit_buffer = old_save_commit_buffer;
 }
 
+static void warn_in_commit_graph_only(const struct object_id *oid)
+{
+	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
+		oid_to_hex(oid));
+	warning(_("This is probably due to repo corruption."));
+	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
+}
+
Here's the new warning. As mentioned in my reply to the cover letter, I
feel like it makes more sense to die(), but I don't feel too strongly
about it.

quoted hunk ↗ jump to hunk
 /*
  * Returns 1 if every object pointed to by the given remote refs is available
  * locally and reachable from a local ref, and 0 otherwise.
@@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
 				      ref->name);
 			continue;
 		}
+		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
+			if (!has_object(the_repository, remote, 0))
+				warn_in_commit_graph_only(remote);
+		}
And now that we're checking what's local, we issue our warning if we
have an object missing from the DB but mentioned in the commit graph.
Seems fine, although I wonder if it makes more sense to fail earlier. It
looks like the only place we do the
`mark_additional_complete_information` checks is in `mark_complete()`,
so should we just check this condition there? No strong feelings either
way, just curious.

quoted hunk ↗ jump to hunk
 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
 			      ref->name);
 	}
diff --git a/object.h b/object.h
index 17f32f1103..196e489253 100644
--- a/object.h
+++ b/object.h
@@ -65,7 +65,7 @@ void object_array_init(struct object_array *array);
 /*
  * object flag allocation:
  * revision.h:               0---------10         15               23------27
- * fetch-pack.c:             01    67
+ * fetch-pack.c:             01    6-8
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
  * upload-pack.c:                4       11-----14  16-----19
-- 
2.47.0.163.g1226f6d8fa-goog
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help