Re: [PATCH v5 22/26] t7700: consolidate code into test_no_missing_in_packs()
From: Denton Liu <hidden>
Date: 2019-12-02 20:50:43
Hi Junio, On Fri, Nov 29, 2019 at 01:39:30PM -0800, Junio C Hamano wrote:
Denton Liu [off-list ref] writes:quoted
The code to test that objects were not missing from the packfile was duplicated many times. Extract the duplicated code into test_no_missing_in_packs() and use that instead. Refactor the resulting extraction so that if any git commands fail, their return codes are not silently lost. We were using sed to filter lines. Although not incorrect, this is exactly what grep is built for. Replace this invocation of sed with grep so that we use the correct tool for the job.Well, $ sed -n -e 's/required match/desired part of the line/p' is much much more approirate than $ grep -e "requred match" | extract desired part of the line "grep" is better only if the original were $ sed -n -e '/required match/p' but everybody would write it with grep to begin with ;-)
This was what I was intending. It was originally written like the above and it made sense to convert it to use grep. I guess "filter lines" in my commit message is a little bit vague. Could we change this to "filter matching lines" perhaps?
So, I dunno about this part of the conversion.quoted
Instead of verifying each file of `alt_objects/pack/*.idx` individually in a for-loop, batch them together into one verification step.Do you mean this one? git verify-pack -v alt_objects/pack/*.idx where we may pass 1 or more .idx file to the command? At first my reading was interrupted by a "Huh?", but that does look good. We'd need to be a bit careful to make sure that we have at least 1 .idx file, as the shell will happily feed a file whose name is "*.idx", which verify-pack would be unhappy about.quoted
The original testing construct was O(n^2): it used a grep in a loop to test whether any objects were missing in the packfile. Rewrite this to sort the files then use `comm -23` so that finding missing lines from the original file is done more efficiently.OK. If we an show measurable speedups, it would be great, but the loop structure does look O(n^2) and unnecessary costly.quoted
+test_no_missing_in_packs () { + myidx=$(ls -1 .git/objects/pack/*.idx) && + test_path_is_file "$myidx" &&If there are 2 or more .idx files, or if there is none, $myidx would hopefully be a concatenation of these filenames or a string that ends with asterisk-dot-idx and would fail path_is_file. Sounds OK. Ah, I do not have to review this part---these are repeated patterns in the original.quoted
+ git verify-pack -v alt_objects/pack/*.idx >orig.raw && + grep "^[0-9a-f]\{40\}" orig.raw | cut -d" " -f1 | sort >orig &&If output from 'grep' can be used as-is, it is worth doing, but if you have to pipe it to cut, the original that used sed to filter and edit the line would probably be a better way to write it.
The original sed actually only filtered the line; no editing done. The
cut invocation was a consequence of using comm. Previously, in the while
loop, we would separate the line into `sha1` and `rest` components and
only match using the `sha1`. Since we use comm now, we have to use cut
to grab the sha1 and omit the rest of the line.
We could rewrite it with sed like this:
sed -n -e "/^[0-9a-f]\{40\}/s/^\($[0-9a-f]\{40\}\).*/\1/" orig.raw
but I believe that breaking it into grep and cut makes the intent much
more clear.
What do you think?
(By the way, if I were to reroll this series, should I keep sending out
the entire patchset? It feels very noisy to send out 20-something emails
every reroll when I'm just making a small one or two line change.)
Thanks,
Denton
quoted
+ git verify-pack -v $myidx >dest.raw &&This part does not quote $myidx" (inherited from the original); it probably is OK, as any potentially problematic value in $myidx would have been caught as an error much earlier in this test.