Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel

[PATCH 00/18] make test "linting" more comprehensive · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 01/18] t: add skeleton chainlint.pl · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: [PATCH 01/18] t: add skeleton chainlint.pl · Ævar Arnfjörð Bjarmason <hidden> · 2022-09-01
Re: [PATCH 01/18] t: add skeleton chainlint.pl · Eric Sunshine <hidden> · 2022-09-02
[PATCH 02/18] chainlint.pl: add POSIX shell lexical analyzer · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: [PATCH 02/18] chainlint.pl: add POSIX shell lexical analyzer · Ævar Arnfjörð Bjarmason <hidden> · 2022-09-01
Re: [PATCH 02/18] chainlint.pl: add POSIX shell lexical analyzer · Eric Sunshine <hidden> · 2022-09-03
[PATCH 04/18] chainlint.pl: add parser to validate tests · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 03/18] chainlint.pl: add POSIX shell parser · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Ævar Arnfjörð Bjarmason <hidden> · 2022-09-01
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-09-03
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Wong <hidden> · 2022-09-06
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-09-06
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Jeff King <hidden> · 2022-09-06
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Ævar Arnfjörð Bjarmason <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Ævar Arnfjörð Bjarmason <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Jeff King <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Jeff King <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Eric Sunshine <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Jeff King <hidden> · 2022-11-21
Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel · Ævar Arnfjörð Bjarmason <hidden> · 2022-11-22
[PATCH 05/18] chainlint.pl: add parser to identify test definitions · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 07/18] chainlint.pl: don't require `return|exit|continue` to end with `&&` · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 10/18] chainlint.pl: don't flag broken &&-chain if `$?` handled explicitly · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 12/18] chainlint.pl: complain about loops lacking explicit failure handling · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 09/18] chainlint.pl: don't require `&` background command to end with `&&` · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 08/18] t/Makefile: apply chainlint.pl to existing self-tests · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 11/18] chainlint.pl: don't flag broken &&-chain if failure indicated explicitly · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 13/18] chainlint.pl: allow `|| echo` to signal failure upstream of a pipe · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 14/18] t/chainlint: add more chainlint.pl self-tests · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 15/18] test-lib: retire "lint harder" optimization hack · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
[PATCH 16/18] test-lib: replace chainlint.sed with chainlint.pl · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: [PATCH 16/18] test-lib: replace chainlint.sed with chainlint.pl · Elijah Newren <hidden> · 2022-09-03
Re: [PATCH 16/18] test-lib: replace chainlint.sed with chainlint.pl · Eric Sunshine <hidden> · 2022-09-03
[PATCH 18/18] t: retire unused chainlint.sed · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: several messages · Johannes Schindelin <hidden> · 2022-09-02
Re: several messages · Eric Sunshine <hidden> · 2022-09-02
Re: several messages · Jeff King <hidden> · 2022-09-02
[PATCH 17/18] t/Makefile: teach `make test` and `make prove` to run chainlint.pl · Eric Sunshine via GitGitGadget <hidden> · 2022-09-01
Re: [PATCH 00/18] make test "linting" more comprehensive · Jeff King <hidden> · 2022-09-11
Re: [PATCH 00/18] make test "linting" more comprehensive · Eric Sunshine <hidden> · 2022-09-11
Re: [PATCH 00/18] make test "linting" more comprehensive · Jeff King <hidden> · 2022-09-11
Re: [PATCH 00/18] make test "linting" more comprehensive · Eric Sunshine <hidden> · 2022-09-12
Re: [PATCH 00/18] make test "linting" more comprehensive · Jeff King <hidden> · 2022-09-13

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2022-09-01 12:48:10

On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote:

From: Eric Sunshine <redacted>

Although chainlint.pl has undergone a good deal of optimization during
its development -- increasing in speed significantly -- parsing and
validating 1050+ scripts and 16500+ tests via Perl is not exactly
instantaneous. However, perceived performance can be improved by taking
advantage of the fact that there is no interdependence between test
scripts or test definitions, thus parsing and validating can be done in
parallel. The number of available cores is determined automatically but
can be overridden via the --jobs option.

Per your CL:
	
	Ævar offered some sensible comments[2,3] about optimizing the Makefile rules
	related to chainlint, but those optimizations are not tackled here for a few
	reasons: (1) this series is already quite long, (2) I'd like to keep the
	series focused on its primary goal of installing a new and improved linter,
	(3) these patches do not make the Makefile situation any worse[4], and (4)
	those optimizations can easily be done atop this series[5].

I have been running with those t/Makefile changesg locally, but didn't
submit them. FWIW that's here:

	https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-dependency-graph-for-check-chainlint

Which I'm not entirely sure I'm happy about, and it's jeust about the
chainlint tests, but...

quoted hunk ↗ jump to hunk

+sub ncores {
+	# Windows
+	return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS});
+	# Linux / MSYS2 / Cygwin / WSL
+	do { local @ARGV='/proc/cpuinfo'; return scalar(grep(/^processor\s*:/, <>)); } if -r '/proc/cpuinfo';
+	# macOS & BSD
+	return qx/sysctl -n hw.ncpu/ if $^O =~ /(?:^darwin$|bsd)/;
+	return 1;
+}
+
 sub show_stats {
 	my ($start_time, $stats) = @_;
 	my $walltime = $interval->($start_time);

@@ -621,7 +633,9 @@ sub exit_code {
 Getopt::Long::Configure(qw{bundling});
 GetOptions(
 	"emit-all!" => \$emit_all,
+	"jobs|j=i" => \$jobs,
 	"stats|show-stats!" => \$show_stats) or die("option error\n");
+$jobs = ncores() if $jobs < 1;
 
 my $start_time = $getnow->();
 my @stats;
@@ -633,6 +647,40 @@ unless (@scripts) {
 	exit;
 }
 
-push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
+unless ($Config{useithreads} && eval {
+	require threads; threads->import();
+	require Thread::Queue; Thread::Queue->import();
+	1;
+	}) {
+	push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); }));
+	show_stats($start_time, \@stats) if $show_stats;
+	exit(exit_code(\@stats));
+}
+
+my $script_queue = Thread::Queue->new();
+my $output_queue = Thread::Queue->new();
+
+sub next_script { return $script_queue->dequeue(); }
+sub emit { $output_queue->enqueue(@_); }
+
+sub monitor {
+	while (my $s = $output_queue->dequeue()) {
+		print($s);
+	}
+}
+
+my $mon = threads->create({'context' => 'void'}, \&monitor);
+threads->create({'context' => 'list'}, \&check_script, $_, \&next_script, \&emit) for 1..$jobs;
+
+$script_queue->enqueue(@scripts);
+$script_queue->end();
+
+for (threads->list()) {
+	push(@stats, $_->join()) unless $_ == $mon;
+}
+
+$output_queue->end();
+$mon->join();

Maybe I'm misunderstanding this whole thing, but this really seems like
the wrong direction in an otherwise fantastic direction of a series.

I.e. it's *great* that we can do chain-lint without needing to actually
execute the *.sh file, this series adds a lint parser that can parse
those *.sh "at rest".

But in your 16/18 you then do:
	
	+if test "${GIT_TEST_CHAIN_LINT:-1}" != 0
	+then
	+	"$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" ||
	+		BUG "lint error (see '?!...!? annotations above)"
	+fi
	
I may just be missing something here, but why not instead just borrow
what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs"
non-.PHONY, 2021-10-15)?

I.e. if we can run against t0001-init.sh or whatever *once* to see if it
chain-lints OK then surely we could have a rule like:

	t0001-init.sh.chainlint-ok: t0001-init.sh
		perl chainlint.pl $< >$@

Then whenever you change t0001-init.sh we refresh that
t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll
fail to make it, and will unlink that t0001-init.sh.chainlint-ok.

That way you wouldn't need any parallelism in the Perl script, because
you'd have "make" take care of it, and the common case of re-testing
where the speed matters would be that we woudln't need to run this at
all, or would only re-run it for the test scripts that changed.

(Obviously a "real" implementation would want to create that ".ok" file
in t/.build/chainlint" or whatever)

A drawback is that you'd probably be slower on the initial run, as you'd
spwn N chainlint.pl. You could use $? instead of $< to get around that,
but that requires some re-structuring, and I've found it to generally
not be worth it.

It would also have the drawback that a:

	./t0001-init.sh

wouldn't run the chain-lint, but this would:

	make T=t0001-init.sh

But if want the former to work we could carry some
"GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the
test-via-test-lib.sh if it isn't set.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help