Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2022-09-01 12:48:10
On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote:
From: Eric Sunshine <redacted> Although chainlint.pl has undergone a good deal of optimization during its development -- increasing in speed significantly -- parsing and validating 1050+ scripts and 16500+ tests via Perl is not exactly instantaneous. However, perceived performance can be improved by taking advantage of the fact that there is no interdependence between test scripts or test definitions, thus parsing and validating can be done in parallel. The number of available cores is determined automatically but can be overridden via the --jobs option.
Per your CL: Ævar offered some sensible comments[2,3] about optimizing the Makefile rules related to chainlint, but those optimizations are not tackled here for a few reasons: (1) this series is already quite long, (2) I'd like to keep the series focused on its primary goal of installing a new and improved linter, (3) these patches do not make the Makefile situation any worse[4], and (4) those optimizations can easily be done atop this series[5]. I have been running with those t/Makefile changesg locally, but didn't submit them. FWIW that's here: https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint Which I'm not entirely sure I'm happy about, and it's jeust about the chainlint tests, but...
quoted hunk ↗ jump to hunk
+sub ncores { + # Windows + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS}); + # Linux / MSYS2 / Cygwin / WSL + do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo'; + # macOS & BSD + return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/; + return 1; +} + sub show_stats { my ($start_time, $stats) = @_; my $walltime = $interval->($start_time);@@ -621,7 +633,9 @@ sub exit_code { Getopt::Long::Configure(qw{bundling}); GetOptions( "emit-all!" => \$emit_all, + "jobs|j=i" => \$jobs, "stats|show-stats!" => \$show_stats) or die("option error\n"); +$jobs = ncores() if $jobs < 1; my $start_time = $getnow->(); my @stats;@@ -633,6 +647,40 @@ unless (@scripts) { exit; } -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); +unless ($Config{useithreads} && eval { + require threads; threads->import(); + require Thread::Queue; Thread::Queue->import(); + 1; + }) { + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); })); + show_stats($start_time, \@stats) if $show_stats; + exit(exit_code(\@stats)); +} + +my $script_queue = Thread::Queue->new(); +my $output_queue = Thread::Queue->new(); + +sub next_script { return $script_queue->dequeue(); } +sub emit { $output_queue->enqueue(@_); } + +sub monitor { + while (my $s = $output_queue->dequeue()) { + print($s); + } +} + +my $mon = threads->create({'context' => 'void'}, \&monitor); +threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs; + +$script_queue->enqueue(@scripts); +$script_queue->end(); + +for (threads->list()) { + push(@stats, $_->join()) unless $_ == $mon; +} + +$output_queue->end(); +$mon->join();
Maybe I'm misunderstanding this whole thing, but this really seems like
the wrong direction in an otherwise fantastic direction of a series.
I.e. it's *great* that we can do chain-lint without needing to actually
execute the *.sh file, this series adds a lint parser that can parse
those *.sh "at rest".
But in your 16/18 you then do:
+if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
+then
+ "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
+ BUG "lint error (see '?!...!? annotations above)"
+fi
I may just be missing something here, but why not instead just borrow
what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs"
non-.PHONY, 2021-10-15)?
I.e. if we can run against t0001-init.sh or whatever *once* to see if it
chain-lints OK then surely we could have a rule like:
t0001-init.sh.chainlint-ok: t0001-init.sh
perl chainlint.pl $< >$@
Then whenever you change t0001-init.sh we refresh that
t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll
fail to make it, and will unlink that t0001-init.sh.chainlint-ok.
That way you wouldn't need any parallelism in the Perl script, because
you'd have "make" take care of it, and the common case of re-testing
where the speed matters would be that we woudln't need to run this at
all, or would only re-run it for the test scripts that changed.
(Obviously a "real" implementation would want to create that ".ok" file
in t/.build/chainlint" or whatever)
A drawback is that you'd probably be slower on the initial run, as you'd
spwn N chainlint.pl. You could use $? instead of $< to get around that,
but that requires some re-structuring, and I've found it to generally
not be worth it.
It would also have the drawback that a:
./t0001-init.sh
wouldn't run the chain-lint, but this would:
make T=t0001-init.sh
But if want the former to work we could carry some
"GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the
test-via-test-lib.sh if it isn't set.