Re: [PATCH 2/2] builtin/pack-objects.c: freshen objects from existing cruft packs
From: Taylor Blau <hidden>
Date: 2025-02-27 23:03:12
On Thu, Feb 27, 2025 at 11:26:32AM -0800, Elijah Newren wrote:
quoted
However, this process breaks down when we attempt to freshen an object packed in an earlier cruft pack that is larger than the threshold and thus will survive the repack....packed in an earlier cruft pack, and that cruft pack is larger than the threshold... (Otherwise, it's unclear whether you are talking about the object or the cruft pack it is in being larger than the threshold.)
Good suggestion, thanks!
quoted
When this is the case, it is impossible to freshen objects in cruft pack(s) which are larger than the threshold. This is because we avoid writing them in the new cruft pack entirely, for a couple of reasons....freshen objects in cruft packs when those cruft packs are larger than the threshold... Again, just to clarify what thing is "larger".
Likewise, this makes sense as well, and I applied it in my local copy.
Also, this paragraph while fine on its own is slightly unclear whether you are discussing pre-patch or post-patch state, which when reading the next two items causes some double takes. Perhaps just spell it out slightly clearer here that for the next two enumerated items you are discussing the existing state previous to your changes?
I adjusted the paragraph before this one to make it a little clearer. Instead of saying "However, [...]", I rewrote it as "Prior to this patch, however, [...]".
quoted
- exists in a non-cruft pack that we are retaining, regardless of that pack's mtime, or - exists in a cruft pack with an mtime more recent than the copy we are debating whether or not to pack, in which case freshening would be redundant.s/more recent than/at least as recent as/ ?
Thanks for the careful read, and yes, the comparison here is a >= rather than a strict >, and that difference is worth being precise about.
quoted
To do this, keep track of whether or not we have any cruft packs in our in-core kept list with a new 'ignore_packed_keep_in_core_has_cruft' flag. When we end up in this new special case, we replace a call to 'has_object_kept_pack()' to 'want_cruft_object_mtime()', and only reject objects when we have a copy in an existing cruft pack with a more recent mtime (in which case "freshening" would be redundant).Again, s/a more recent/at least as recent/ ?
I like this suggestion, but I think the wording ends up a little awkward if applied as-is. I turned this sentence into: [...], and only reject objects when we have a copy in an existing cruft pack with at least as recent an mtime as our candidate (in which case "freshening" would be redundant). Let me know what you think!
quoted
+test_expect_success '--max-cruft-size with freshened objects (previously cruft)' ' + git init max-cruft-size-threshold && + ( + cd max-cruft-size-threshold && + + test_commit base && + foo="$(generate_random_blob foo $((2*1024*1024)))" && + bar="$(generate_random_blob bar $((2*1024*1024)))" && + baz="$(generate_random_blob baz $((2*1024*1024)))" && + + test-tool chmtime --get -100000 \ + "$objdir/$(test_oid_to_path "$foo")" >foo.old && + test-tool chmtime --get -100000 \ + "$objdir/$(test_oid_to_path "$bar")" >bar.old && + test-tool chmtime --get -100000 \ + "$objdir/$(test_oid_to_path "$baz")" >baz.old && + + git repack --cruft -d && + + # Make a packed copy of object $foo with a more recent + # mtime.s/$foo/foo/ ?
Eh. $foo holds the OID of that blob, so "foo" on its own doesn't really mean anything (even though the implicit meaning is clear from context). I think changing it is fine (leaving it alone is equally fine in my mind, but I don't feel strongly about it).
quoted
+ foo="$(generate_random_blob foo $((2*1024*1024)))" &&I thought this was creating a completely different foo, which would defeat the point of the test. It might be worth adding a comment that because generate_random_blob uses a very simplistic and repeatable random character generator with the first argument as the seed, that this will regenerate the same loose object as above for foo.
I think the part of the comment which reads "packed copy of" makes it clear-ish that we're creating an identical copy, but it doesn't hurt to be more explicit here. Thanks for the careful read! Thanks, Taylor