Thread (47 messages) 47 messages, 7 authors, 2022-07-01

Re: [PATCH v3 0/5] Avoid spawning gzip in git archive

From: Johannes Schindelin <hidden>
Date: 2022-07-01 16:06:41

Me again,

On Thu, 30 Jun 2022, Johannes Schindelin wrote:
I finally managed to play around with building and packaging zlib-ng
[*1*] (since I want to use it as a drop-in replacement for zlib, I think
it is best to configure it with `--zlib-compat`, that way I do not have
to fiddle with any equivalent of `LD_PRELOAD`). Here are my numbers:

	zlib-ng: 14.409 s ± 0.209 s
	zlib:    26.843 s ± 0.636 s

These are pretty good, which made me think that they might actually even
help regular Git operations (because we zlib every loose object).

So I tried to `fast-import` some 2500 commits from linux.git into a fresh
repository, and the zlib-ng version takes ~51s and the zlib version takes
~58s. At first I thought that it might be noise, but the trend seems to be
steady. It's not a huge improvement, of course, but I think that might be
because most of the time is spent parsing.

I then tried to test the performance focusing on writing loose object, by
using p0008 (increasing the number of files from 50 to 1500 and
restricting it to fsyncMethod=none).

Unfortunately, the numbers are not really conclusive. I do see minor
speed-ups with zlib-ng, mostly, in the single digit percentages, though
occasionally in the other direction. In other words, there is no clear-cut
change, just a vague tendency. My guess: Git writes too small files (their
contents are of the form "$basedir$test_tick.$counter") and zlib-ng's
superior performance does not come to bear.

Still, for larger workloads, zlib-ng seems to offer a quite nice and
substantial performance improvement over zlib.
Stolee pointed out to me that objects inside pack files are also
zlib-compressed, and that measuring the speed of `git rev-list --objects
--all --count` might therefore be a better test.

And this is where things get a little messy: in the context of Git for
Windows, my local measurements indicate that zlib is better, with ~41
seconds using zlib vs ~52 seconds using zlib-ng (but the latter has a
rather large variance).

These measurements were done with a relatively straight-forward build of
zlib-ng v2.0.6, and on a hunch I then tried to build the tip of zlib-ng's
`develop` branch (which was much less straight-forward) and now get
virtually the same speed with that `rev-list` command.

But then I repeated the `archive` measurement with the `develop` version
of zlib-ng, and while it was still substantially faster than zlib, it was
slightly slower than zlib-ng v2.0.6 (zlib: ~26 seconds, zlib-ng v2.0.6:
~14 seconds, zlib-ng develop: ~16 seconds). Still, much, much faster than
using `-c tar.tgz.command="gzip -cn"` at ~24 seconds.

So: the picture is messy. The latest official release of zlib-ng seems to
offer performance wins using `archive` but slight losses using `rev-list.
Upgrading to the latest revision of zlib-ng offers slightly smaller
performance wins using `archive` and equivalent performance using
`rev-list`. Both blow `gzip -cn` out of the water, thanks to using MMX or
whatever my laptop's CPU offers.

The take-away as far as Git for Windows is concerned: It seems not _quite_
the time yet to switch from zlib to zlib-ng, I want to wait until there is
an official zlib-ng release with favorable speed.

Ciao,
Dscho

P.S.: I pushed a WIP update to this branch:
Footnote *1*: https://github.com/msys2/MINGW-packages/compare/master...dscho:zlib-ng
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help