Thread (22 messages) 22 messages, 3 authors, 2023-06-09

Re: [PATCH v3 2/2] builtin/pack-objects.c: introduce `pack.recentObjectsHook`

From: Jeff King <hidden>
Date: 2023-05-12 21:45:47
Subsystem: the rest · Maintainer: Linus Torvalds

On Fri, May 12, 2023 at 05:24:56PM -0400, Jeff King wrote:
quoted
This patch introduces a new configuration, `pack.recentObjectsHook`
which allows the caller to specify a program (or set of programs) whose
output is treated as a set of objects to treat as recent, regardless of
their true age.
I was going to complain about putting this in the "pack" section,
because I thought by touching reachable.c, we'd also affect git-prune.
But I don't think we do, because it does its own direct mtime check on
the loose objects.

But I'm not sure that's the right behavior.

It feels like even before your patch, this is a huge gap in our
object-retention strategy.  During repacking, we try to avoid dropping
objects which are reachable from recent-but-unreachable things we're
keeping (since otherwise it effectively corrupts those recent objects,
making them less valuable to keep). But git-prune will happily drop them
anyway!

And I think the same thing would apply to your hook. If the hook says
"object XYZ is precious even if unreachable, keep it", then git-prune
ignoring that seems like it would be a source of errors.

I suspect both could be fixed by having git-prune trigger the same
add_unseen_recent_objects_to_traversal() call either as part of
the perform_reachability_traversal() walk, or maybe in its own walk (I
think maybe it has to be its own because the second walk should avoid
complaining about missing objects).
<phew> I am happy to say that I was wrong here, and git-prune behaves as
it should, courtesy of d3038d22f9 (prune: keep objects reachable from
recent objects, 2014-10-15). The magic happens in mark_reachable_objects(),
which handles walking the recent objects by calling...you guessed it,
add_unseen_recent_objects_to_traversal().

So it does the right thing now, and your patch should kick in
automatically for git-prune, too. But I think we'd want two things:

  1. Should the config variable name be made more generic to match?
     Maybe "core" is too broad (though certainly I'd expect it to apply
     anywhere in Git where we check recent-ness of objects), but perhaps
     "gc" would make sense (even though it is not strictly part of the
     gc command, it is within that realm of concepts).

  2. We probably want a test covering git-prune in this situation.

The thing that confused me when looking at the code earlier is that
git-prune itself checks the mtime of the objects, too, even if they were
not mentioned in the recent-but-reachable walk. That seems redundant to
me, but somehow it isn't. If I do:
diff --git a/builtin/prune.c b/builtin/prune.c
index 5dc9b20720..22b7ce4b10 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -92,7 +92,7 @@ static int prune_object(const struct object_id *oid, const char *fullpath,
 		return 0;
 	}
 	if (st.st_mtime > expire)
-		return 0;
+		BUG("object should have been saved via is_object_reachable!");
 	if (show_only || verbose) {
 		enum object_type type = oid_object_info(the_repository, oid,
 							NULL);
then t5304 shows some failures. I'm not quite sure what is going on, but
_if_ that mtime check in prune_object() is important, then it probably
should also respect the hook mechanism. So there may be a (3) to add to
your patch there.

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help