Re: [PATCH 2/2] fetch-pack: warn if in commit graph but not obj db

From: Jonathan Tan <hidden>
Date: 2024-10-31 21:43:22

Taylor Blau [off-list ref] writes:

quoted

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

I think the type of repository corruption here may be underspecified.

Hmm...if you have any specific points you'd like me to elaborate on (or
better yet, wording suggestions), please let me know.

You say that we have some object, say X, whose type is not known. So we
don't load the commit-graph, realize that X is missing, and then try and
fetch it.

Yes.

In this scenario, is X actually in the commit-graph, but not
in the object database?

Yes.

Further, if X is in the commit-graph, I assume
we do not look it up there because we first try and find its type, which
fails, so we assume we don't have it (despite it appearing corruptly in
the commit-graph)?

I think that matches the behavior you're describing, but I want to make
sure that I'm not thinking of something else.

Strictly speaking, we are not trying to find its type. We are trying
to find the object itself. (One could argue that if we find out that
an object is a commit, we can then ignore the packfile and go look up
the commit graph file. I'm not so sure this is a good idea, but this is
moot, I think - as far as I know, we currently don't do this.)

But yes, if the object is not in the object DB, we assume we don't have
it.

You discuss this a little bit in your commit message, but I wonder if we
should just die() here. I feel like we're trying to work around a
situation where the commit-graph is obviously broken because it refers
to commit objects that don't actually exist in the object store.

Yeah, that seems to be the consensus. I've switched it to a fatal error.

A few thoughts in this area:

  - What situation provokes this to be true? I could imagine there is
    some bug that we don't fully have a grasp of. But I wonder if it is
    even easier to provoke than that, say by pruning some objects out of
    the object store, then not rewriting the commit-graph, leaving some
    of the references dangling.

The fetching of promisor objects that are descendants of non-promisor
objects. [1]

I think that the rewriting of the commit graph happens on every repack,
thus avoiding the situation you describe (unless there is a bug there).

[1] https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/ (local)

  - Does 'git fsck' catch this case within the commit-graph?

Honestly, I haven't checked - I've been concentrating on fixing the
fetch part for now (and also the bug that caused the missing commits
[2]).

[2] https://lore.kernel.org/git/cover.1729792911.git.jonathantanmy@google.com/ (local)

  - Are the other areas of the code that rely on the assumption that all
    entries in the commit-graph actually exist on disk? If so, are they
    similarly broken?

Yes, the fetch negotiation code. It is not "broken" in that it solely
uses repo_parse_commit() which always checks the commit graph, so as
long as the commit graph has everything we need, there will be no error.

There might be other systems that rely both on the commit graph and the
object DB, and thus have an inconsistent view (so, "similarly broken" as
you describe it) but at least in the partial clone case, the severity of
the issue is not as high as in "fetch", because these other systems can
lazily fetch the missing commit and then proceed.

Another thought about this whole thing is that we essentially have a
code path that says: "I found this object from the commit-graph, but
don't know if I actually have it on disk, so mark it to be checked later
via has_object()".

I wonder if it would be more straightforward to replace the call to
lookup_commit_in_graph() with a direct call to has_object() in the
deref_without_lazy_fetch() function, which I think would both (a)
eliminate the need for a new flag bit to be allocated, and (b) prevent
looking up the object twice.

Thoughts?

Thanks,
Taylor

This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
loading of refs via commit graph, 2021-09-01), and also would not work
without changes to the fetch negotiation code - I tried to describe it
in the commit message, perhaps not very clearly, but the issue is that
even if we emit "want X", the fetch negotiation code would emit "have
X" (the X is the same in both), and at least for our JGit server at
$DAYJOB, the combination of "want X" and "have X" results in the server
sending an empty packfile (reasonable behavior, I think). (And I don't
think the changes to the fetch negotiation code are worth it.)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help