Thread (16 messages) 16 messages, 4 authors, 2023-12-24

Re: [PATCH] t1006: add tests for %(objectsize:disk)

From: Jeff King <hidden>
Date: 2023-12-23 10:18:54

On Fri, Dec 22, 2023 at 12:13:10AM +0100, René Scharfe wrote:
quoted
quoted
Should we deduplicate here, like cat-file does (i.e. use "sort -u")?
Having the same object in multiple places for whatever reason would not
be a cause for reporting an error in this test, I would think.
No, for the reasons I said in the commit message: if an object exists in
multiple places the test is already potentially invalid, as Git does not
promise which version it will use. So it might work racily, or it might
work for now but be fragile. By not de-duplicating, we make sure the
test's assumption holds.
Oh, skipped that paragraph.  Still I don't see how a duplicate object
would necessarily invalidate t1006.  The comment for the test "cat-file
--batch-all-objects shows all objects" a few lines above indicates that
it's picky about the provenance of objects, but it uses a separate
repository.  I can't infer the same requirement for the root repo, but
we already established that I can't read.
The cat-file documentation explicitly calls this situation out:

  Note also that multiple copies of an object may be present in the
  object database; in this case, it is undefined which copy’s size or
  delta base will be reported.

So if t1006 were to grow such a duplicate object, what will happen? If
we de-dup in the new test, then we might end up mentioning the same copy
(and the test passes), or we might not (and the test fails). But much
worse, the results might be racy (depending on how cat-file happens to
decide which one to use). By no de-duping, then the test will reliably
fail and the author can decide how to handle it then.

IOW it is about failing immediately and predictably rather than letting
a future change to sneak a race or other accident-waiting-to-happen into
t1006.
Anyway, if someone finds a use for git repack without -d or
git unpack-objects or whatever else causes duplicates in the root
repository of t1006 then they can try to reverse your ban with concrete
arguments.
In the real world, the most common way to get a duplicate is to fetch or
push into a repository, such that:

  1. There are enough objects to retain the pack (100 by default)

  2. There's a thin delta in the on-the-wire pack (i.e., a delta against
     a base that the sender knows the receiver has, but which isn't
     itself sent).

Then "index-pack --fix-thin" will complete the on-disk pack by storing a
copy of the base object in it. And now we have it in two packs (and if
it's a delta or loose in the original, the size will be different).

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help