Re: [PATCH v3 0/5] Avoid spawning gzip in git archive
From: Johannes Schindelin <hidden>
Date: 2022-06-30 18:56:12
Hi René, On Tue, 14 Jun 2022, René Scharfe wrote:
Am 14.06.22 um 13:28 schrieb Johannes Schindelin:quoted
By the way, the main reason why I did not work more is that in http://madler.net/pipermail/zlib-devel_madler.net/2019-December/003308.html, Mark Adler (the zlib maintainer) announced that...quoted
[...] There are many well-tested performance improvements in zlib waiting in the wings that will be incorporated over the next several months. [...]This was in December 2019. And now it's June 2022 and I kind of wonder whether those promised improvements will still come. In the meantime, however, a viable alternative seems to have cropped up: https://github.com/zlib-ng/zlib-ng. Essentially, it looks as if it is what zlib should have become after above-quoted announcement. In particular the CPU intrinsics support (think MMX, SSE2/3, etc) seem to be very interesting and I would not be completely surprised if building Git with your patches and linking against zlib-ng would paint a very favorable picture not only in terms of CPU time but also in terms of wallclock time. Sadly, I have not been able to set aside time to look into that angle, but maybe I can peak your interest?I was unable to preload zlib-ng using DYLD_INSERT_LIBRARIES on macOS 12.4 so far. The included demo proggy looks impressive, though: $ hyperfine -w3 -L gzip gzip,../zlib-ng/minigzip "git -C ../linux archive --format=tar HEAD | {gzip} -c" Benchmark #1: git -C ../linux archive --format=tar HEAD | gzip -c Time (mean ± σ): 20.424 s ± 0.006 s [User: 23.964 s, System: 0.432 s] Range (min … max): 20.414 s … 20.434 s 10 runs Benchmark #2: git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c Time (mean ± σ): 12.158 s ± 0.006 s [User: 13.908 s, System: 0.376 s] Range (min … max): 12.145 s … 12.166 s 10 runs Summary 'git -C ../linux archive --format=tar HEAD | ../zlib-ng/minigzip -c' ran 1.68 ± 0.00 times faster than 'git -C ../linux archive --format=tar HEAD | gzip -c'
Intriguing. I finally managed to play around with building and packaging zlib-ng [*1*] (since I want to use it as a drop-in replacement for zlib, I think it is best to configure it with `--zlib-compat`, that way I do not have to fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers: zlib-ng: 14.409 s ± 0.209 s zlib: 26.843 s ± 0.636 s These are pretty good, which made me think that they might actually even help regular Git operations (because we zlib every loose object). So I tried to `fast-import` some 2500 commits from linux.git into a fresh repository, and the zlib-ng version takes ~51s and the zlib version takes ~58s. At first I thought that it might be noise, but the trend seems to be steady. It's not a huge improvement, of course, but I think that might be because most of the time is spent parsing. I then tried to test the performance focusing on writing loose object, by using p0008 (increasing the number of files from 50 to 1500 and restricting it to fsyncMethod=none). Unfortunately, the numbers are not really conclusive. I do see minor speed-ups with zlib-ng, mostly, in the single digit percentages, though occasionally in the other direction. In other words, there is no clear-cut change, just a vague tendency. My guess: Git writes too small files (their contents are of the form "$basedir$test_tick.$counter") and zlib-ng's superior performance does not come to bear. Still, for larger workloads, zlib-ng seems to offer a quite nice and substantial performance improvement over zlib. Ciao, Dscho Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng