Re: [PATCH v5 22/26] t7700: consolidate code into test_no_missing_in_packs()

From: Denton Liu <hidden>
Date: 2019-12-02 20:50:43

Hi Junio,

On Fri, Nov 29, 2019 at 01:39:30PM -0800, Junio C Hamano wrote:

Denton Liu [off-list ref] writes:

quoted

The code to test that objects were not missing from the packfile was
duplicated many times. Extract the duplicated code into
test_no_missing_in_packs() and use that instead.

Refactor the resulting extraction so that if any git commands fail,
their return codes are not silently lost.

We were using sed to filter lines. Although not incorrect, this is
exactly what grep is built for. Replace this invocation of sed with grep
so that we use the correct tool for the job.

Well, 

    $ sed -n -e 's/required match/desired part of the line/p'

is much much more approirate than 

    $ grep -e "requred match" |
      extract desired part of the line

"grep" is better only if the original were

    $ sed -n -e '/required match/p'

but everybody would write it with grep to begin with ;-)

This was what I was intending. It was originally written like the above
and it made sense to convert it to use grep. I guess "filter lines" in
my commit message is a little bit vague. Could we change this to "filter
matching lines" perhaps?

So, I dunno about this part of the conversion.

quoted

Instead of verifying each file of `alt_objects/pack/*.idx` individually
in a for-loop, batch them together into one verification step.

Do you mean this one?

	git verify-pack -v alt_objects/pack/*.idx

where we may pass 1 or more .idx file to the command?  At first my
reading was interrupted by a "Huh?", but that does look good.  We'd
need to be a bit careful to make sure that we have at least 1 .idx
file, as the shell will happily feed a file whose name is "*.idx",
which verify-pack would be unhappy about.

quoted

The original testing construct was O(n^2): it used a grep in a loop to
test whether any objects were missing in the packfile. Rewrite this to
sort the files then use `comm -23` so that finding missing lines from
the original file is done more efficiently.

OK.  If we an show measurable speedups, it would be great, but the
loop structure does look O(n^2) and unnecessary costly.

quoted

+test_no_missing_in_packs () {
+	myidx=$(ls -1 .git/objects/pack/*.idx) &&
+	test_path_is_file "$myidx" &&

If there are 2 or more .idx files, or if there is none, $myidx would
hopefully be a concatenation of these filenames or a string that
ends with asterisk-dot-idx and would fail path_is_file.  Sounds OK.

Ah, I do not have to review this part---these are repeated patterns
in the original.

quoted

+	git verify-pack -v alt_objects/pack/*.idx >orig.raw &&
+	grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig &&

If output from 'grep' can be used as-is, it is worth doing, but if
you have to pipe it to cut, the original that used sed to filter and
edit the line would probably be a better way to write it.

The original sed actually only filtered the line; no editing done. The
cut invocation was a consequence of using comm. Previously, in the while
loop, we would separate the line into `sha1` and `rest` components and
only match using the `sha1`. Since we use comm now, we have to use cut
to grab the sha1 and omit the rest of the line.

We could rewrite it with sed like this:

	sed -n -e "/^[0-9a-f]\{40\}/s/^\($[0-9a-f]\{40\}\).*/\1/" orig.raw

but I believe that breaking it into grep and cut makes the intent much
more clear. 

What do you think?

(By the way, if I were to reroll this series, should I keep sending out
the entire patchset? It feels very noisy to send out 20-something emails
every reroll when I'm just making a small one or two line change.)

Thanks,

Denton

quoted

+	git verify-pack -v $myidx >dest.raw &&

This part does not quote $myidx" (inherited from the original); it
probably is OK, as any potentially problematic value in $myidx would
have been caught as an error much earlier in this test.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help