Re: [PATCH 2/1] test-lib-functions: add object size functions
From: René Scharfe <hidden>
Date: 2023-12-19 16:43:02
Am 14.12.23 um 21:59 schrieb Jeff King:
On Wed, Dec 13, 2023 at 01:28:56PM +0100, René Scharfe wrote:quoted
Add test_object_size and its helpers test_loose_object_size and test_packed_object_size, which allow determining the size of a Git object using only the low-level Git commands rev-parse and show-index. Use it in t6300 to replace the bare-bones function test_object_file_size as a motivating example. There it provides the expected output of the high-level Git command for-each-ref.This adds a packed-object function, but I doubt anybody actually calls it. If we're going to do that, it's probably worth adding some tests for "cat-file --batch-check" or similar.
Yes, and I was assuming that someone else would be eager to add such tests. *ahem*
At which point I wonder if rather than having a function for a single object, we are better off just testing the result of: git cat-file --batch-all-objects --unordered --batch-check='%(objectsize:disk)' against a single post-processed "show-index" invocation.
Sure, we might want to optimize for bulk-processing and possibly end up only checking the size of single objects in t6300, making new library functions unnecessary. When dumping size information of multiple objects, it's probably a good idea to include "%(objectname)" as well in the format. You'd need one show-index call for each .idx file. A simple test would only have a single one; a library function might need to handle multiple packs.
quoted
So how about this? I'm a bit nervous about all the rules about output descriptors and error propagation and whatnot in the test library, but this implementation seems simple enough and might be useful in more than one test. No idea how to add support for alternate object directories, but I doubt we'll ever need it.I'm not sure that we need to do anything special with output redirection. Shouldn't these functions just send errors to stderr as usual? If they are run inside a test_expect block, that goes to descriptor 4 (which is either /dev/null or the original stderr, depending on whether "-v" was used).
Good point. My bad excuse is that I copied the redirection to fd 4 from test_grep.
quoted
+ git show-index <"$idx" | + awk -v oid="$oid" -v end="$end" ' + $2 == oid {start = $1} + {offsets[$1] = 1} + END { + if (!start || start >= end) + exit 1 + for (o in offsets) + if (start < o && o < end) + end = o + print end - start + } + ' && return 0I was confused at first, because I didn't see any sorting happening. But if I understand correctly, you're just looking for the smallest "end" that comes after the start of the object we're looking for. Which I think works.
Yes, calculating the minimum offset suffices when handling a single
object -- no sorting required. For bulk mode we'd better sort, of
course:
git show-index <"$idx" |
sort -n |
awk -v end="$end" '
NR > 1 {print oid, $1 - start}
{start = $1; oid = $2}
END {print oid, end - start}
'
No idea how to make such a thing robust against malformed or truncated
output from show-index, but perhaps that's not necessary, depending on
how the result is used.
René