Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel
From: Eric Sunshine <hidden>
Date: 2022-11-21 04:03:12
On Tue, Sep 6, 2022 at 7:27 PM Jeff King [off-list ref] wrote:
I did some timings the other night, and I found something quite curious
with the thread stuff.
I was quite surprised that it made things slower! It's nice that we're
only calling it once per script instead of once per test, but it seems
the startup overhead of the script is really high.
And since in this mode we're only feeding it one script at a time, I
tried reverting the "chainlint.pl: validate test scripts in parallel"
commit. And indeed, now things are much faster:
Benchmark 1: make
Time (mean ± σ): 61.544 s ± 3.364 s [User: 556.486 s, System: 384.001 s]
Range (min … max): 57.660 s … 63.490 s 3 runs
And you can see the same thing just running chainlint by itself:
$ time perl chainlint.pl /dev/null
real 0m0.069s
user 0m0.042s
sys 0m0.020s
$ git revert HEAD^{/validate.test.scripts.in.parallel}
$ time perl chainlint.pl /dev/null
real 0m0.014s
user 0m0.010s
sys 0m0.004s
I didn't track down the source of the slowness. Maybe it's loading extra
modules, or maybe it's opening /proc/cpuinfo, or maybe it's the thread
setup. But it's a surprising slowdown.It is surprising, and unfortunate. Ditching "ithreads" would probably be a good idea. (more on that below)
Now of course your intent is to do a single repo-wide invocation. And
that is indeed a bit faster. Here it is without the parallel code:
Benchmark 1: make
Time (mean ± σ): 61.727 s ± 2.140 s [User: 507.712 s, System: 377.753 s]
Range (min … max): 59.259 s … 63.074 s 3 runs
The wall-clock time didn't improve much, but the CPU time did. Restoring
the parallel code does improve the wall-clock time a bit, but at the
cost of some extra CPU:
Benchmark 1: make
Time (mean ± σ): 59.029 s ± 2.851 s [User: 515.690 s, System: 380.369 s]
Range (min … max): 55.736 s … 60.693 s 3 runs
which makes sense. If I do a with/without of just "make test-chainlint",
the parallelism is buying a few seconds of wall-clock:
Benchmark 1: make test-chainlint
Time (mean ± σ): 900.1 ms ± 102.9 ms [User: 12049.8 ms, System: 79.7 ms]
Range (min … max): 704.2 ms … 994.4 ms 10 runs
Benchmark 1: make test-chainlint
Time (mean ± σ): 3.778 s ± 0.042 s [User: 3.756 s, System: 0.023 s]
Range (min … max): 3.706 s … 3.833 s 10 runs
I'm not sure what it all means. For Linux, I think I'd be just as happy
with a single non-parallelized test-chainlint run for each file. But
maybe on Windows the startup overhead is worse? OTOH, the whole test run
is so much worse there. One process per script is not going to be that
much in relative terms either way.Somehow Windows manages to be unbelievably slow no matter what. I mentioned elsewhere (after you sent this) that I tested on a five or six year old 8-core dual-boot machine. Booted to Linux, running a single chainlint.pl invocation using all 8 cores to check all scripts in the project took under 1 second walltime. The same machine booted to Windows using all 8 cores took just under two minutes(!) walltime for the single Perl invocation to check all scripts in the project. So, at this point, I have no hope for making linting fast on Windows; it seems to be a lost cause.
And if we did cache the results and avoid extra invocations via "make", then we'd want all the parallelism to move to there anyway. Maybe that gives you more food for thought about whether perl's "use threads" is worth having.
I'm not especially happy about the significant overhead of "ithreads"; on my (old) machine, although it does improve perceived time significantly, it eats up quite a bit of additional user-time. As such, I would not be unhappy to see "ithreads" go away, especially since fast linting on Windows seems unattainable (at least with Perl). Overall, I think Ævar's plan to parallelize linting via "make" is probably the way to go.