Re: [PATCH 2/1] test-lib-functions: add object size functions

From: René Scharfe <hidden>
Date: 2023-12-19 16:43:02

Am 14.12.23 um 21:59 schrieb Jeff King:

On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:

quoted

Add test_object_size and its helpers test_loose_object_size and
test_packed_object_size, which allow determining the size of a Git
object using only the low-level Git commands rev-parse and show-index.

Use it in t6300 to replace the bare-bones function test_object_file_size
as a motivating example.  There it provides the expected output of the
high-level Git command for-each-ref.

This adds a packed-object function, but I doubt anybody actually calls
it. If we're going to do that, it's probably worth adding some tests for
"cat-file --batch-check" or similar.

Yes, and I was assuming that someone else would be eager to add such
tests. *ahem*

At which point I wonder if rather than having a function for a single
object, we are better off just testing the result of:

  git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)'

against a single post-processed "show-index" invocation.

Sure, we might want to optimize for bulk-processing and possibly end up
only checking the size of single objects in t6300, making new library
functions unnecessary.

When dumping size information of multiple objects, it's probably a good
idea to include "%(objectname)" as well in the format.

You'd need one show-index call for each .idx file.  A simple test would
only have a single one; a library function might need to handle multiple
packs.

quoted

So how about this?  I'm a bit nervous about all the rules about output
descriptors and error propagation and whatnot in the test library, but
this implementation seems simple enough and might be useful in more than
one test.  No idea how to add support for alternate object directories,
but I doubt we'll ever need it.

I'm not sure that we need to do anything special with output
redirection. Shouldn't these functions just send errors to stderr as
usual? If they are run inside a test_expect block, that goes to
descriptor 4 (which is either /dev/null or the original stderr,
depending on whether "-v" was used).

Good point.  My bad excuse is that I copied the redirection to fd 4 from
test_grep.

quoted

+	git show-index <"$idx" |
+	awk -v oid="$oid" -v end="$end" '
+		$2 == oid {start = $1}
+		{offsets[$1] = 1}
+		END {
+			if (!start || start >= end)
+				exit 1
+			for (o in offsets)
+				if (start < o && o < end)
+					end = o
+			print end - start
+		}
+	' && return 0

I was confused at first, because I didn't see any sorting happening. But
if I understand correctly, you're just looking for the smallest "end"
that comes after the start of the object we're looking for. Which I
think works.

Yes, calculating the minimum offset suffices when handling a single
object -- no sorting required.  For bulk mode we'd better sort, of
course:

	git show-index <"$idx" |
	sort -n |
	awk -v end="$end" '
		NR > 1 {print oid, $1 - start}
		{start = $1; oid = $2}
		END {print oid, end - start}
	'

No idea how to make such a thing robust against malformed or truncated
output from show-index, but perhaps that's not necessary, depending on
how the result is used.

René

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help