Thread (31 messages) 31 messages, 3 authors, 2025-03-13

Re: [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs

From: Taylor Blau <hidden>
Date: 2025-02-27 23:03:12

On Thu, Feb 27, 2025 at 11:26:32AM -0800, Elijah Newren wrote:
quoted
However, this process breaks down when we attempt to freshen an object
packed in an earlier cruft pack that is larger than the threshold and
thus will survive the repack.
...packed in an earlier cruft pack, and that cruft pack is larger than
the threshold...

(Otherwise, it's unclear whether you are talking about the object or
the cruft pack it is in being larger than the threshold.)
Good suggestion, thanks!
quoted
When this is the case, it is impossible to freshen objects in cruft
pack(s) which are larger than the threshold. This is because we avoid
writing them in the new cruft pack entirely, for a couple of reasons.
...freshen objects in cruft packs when those cruft packs are larger
than the threshold...

Again, just to clarify what thing is "larger".
Likewise, this makes sense as well, and I applied it in my local copy.
Also, this paragraph while fine on its own is slightly unclear whether
you are discussing pre-patch or post-patch state, which when reading
the next two items causes some double takes.  Perhaps just spell it
out slightly clearer here that for the next two enumerated items you
are discussing the existing state previous to your changes?
I adjusted the paragraph before this one to make it a little clearer.
Instead of saying "However, [...]", I rewrote it as "Prior to this
patch, however, [...]".
quoted
 - exists in a non-cruft pack that we are retaining, regardless of that
   pack's mtime, or

 - exists in a cruft pack with an mtime more recent than the copy we are
   debating whether or not to pack, in which case freshening would be
   redundant.
s/more recent than/at least as recent as/ ?
Thanks for the careful read, and yes, the comparison here is a >= rather
than a strict >, and that difference is worth being precise about.
quoted
To do this, keep track of whether or not we have any cruft packs in our
in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft'
flag. When we end up in this new special case, we replace a call to
'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only
reject objects when we have a copy in an existing cruft pack with a more
recent mtime (in which case "freshening" would be redundant).
Again, s/a more recent/at least as recent/ ?
I like this suggestion, but I think the wording ends up a little awkward
if applied as-is. I turned this sentence into:

  [...], and only reject objects when we have a copy in an existing
  cruft pack with at least as recent an mtime as our candidate (in which
  case "freshening" would be redundant).

Let me know what you think!
quoted
+test_expect_success '--max-cruft-size with freshened objects (previously cruft)' '
+       git init max-cruft-size-threshold &&
+       (
+               cd max-cruft-size-threshold &&
+
+               test_commit base &&
+               foo="$(generate_random_blob foo $((2*1024*1024)))" &&
+               bar="$(generate_random_blob bar $((2*1024*1024)))" &&
+               baz="$(generate_random_blob baz $((2*1024*1024)))" &&
+
+               test-tool chmtime --get -100000 \
+                       "$objdir/$(test_oid_to_path "$foo")" >foo.old &&
+               test-tool chmtime --get -100000 \
+                       "$objdir/$(test_oid_to_path "$bar")" >bar.old &&
+               test-tool chmtime --get -100000 \
+                       "$objdir/$(test_oid_to_path "$baz")" >baz.old &&
+
+               git repack --cruft -d &&
+
+               # Make a packed copy of object $foo with a more recent
+               # mtime.
s/$foo/foo/ ?
Eh. $foo holds the OID of that blob, so "foo" on its own doesn't really
mean anything (even though the implicit meaning is clear from context).
I think changing it is fine (leaving it alone is equally fine in my
mind, but I don't feel strongly about it).
quoted
+               foo="$(generate_random_blob foo $((2*1024*1024)))" &&
I thought this was creating a completely different foo, which would
defeat the point of the test.  It might be worth adding a comment that
because generate_random_blob uses a very simplistic and repeatable
random character generator with the first argument as the seed, that
this will regenerate the same loose object as above for foo.
I think the part of the comment which reads "packed copy of" makes it
clear-ish that we're creating an identical copy, but it doesn't hurt to
be more explicit here.

Thanks for the careful read!

Thanks,
Taylor
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help