Thread (46 messages) 46 messages, 10 authors, 2023-02-06

Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution

From: demerphq <hidden>
Date: 2023-02-01 15:22:51

On Wed, 1 Feb 2023 at 14:49, Ævar Arnfjörð Bjarmason [off-list ref] wrote:

On Wed, Feb 01 2023, demerphq wrote:
quoted
On Wed, 1 Feb 2023, 20:21 Michal Suchánek, [off-list ref] wrote:
quoted
On Wed, Feb 01, 2023 at 12:34:06PM +0100, demerphq wrote:
quoted
Why does it have to be gzip? It is not that hard to come up with a
quoted
historical reasons?
Currently git doesn't advertise that archive creation is stable
right[1]? So I wrote that with the assumption that this new
compression would only be used when making a new archive with a
hypothetical new '--stable' option. So historical reasons don't come
up. Or was there some other form of history that you meant?
We haven't advertised it, but people have come to rely on it, as the
widespread breakages reported when upgrading to v2.38.0 at the start of
this thread show.

That's unfortunate, and those people probably shouldn't have done that,
but that's water under the bridge. I think it would be irresponsible to
change the output willy-nilly at this point, especially when it seems
rather easy to find some compromise everyone will be happy with.
quoted
I'm just trying to point out here that stable compression is doable
and doesn't need to be as complex as specifying a stable gzip format.
I am not even saying git should just do this, just that it /could/ if
it decided that stability was important, and that doing so wouldn't
involve the complexity that Avar was implying would be needed.  Simple
compression like LZ variants are pretty straightforward to implement,
achieve pretty good compression and can run pretty fast.

Yves
[1] if it did the issue kicking off this thread would not have
happened as there would be a test that would have noticed the change.
I have some patches I'm about to submit to address issues in this
thread, and it does add *a* test for archive output stability.

But I'm not at all confident that it's exhaustive. I just found it by
experiment, by locating tests ouf ours where the "git archive" output at
the end is different with gzip and "git archive gzip".

But is it guaranteed to find all potential cases where repository
content might trigger different output with different gzip
implementations? I don't know, but probably not.
BTW, I just happened to be looking at the zstd docs (I am updating
code that uses it), I saw this:

Zstandard's format is stable and documented in
[RFC8878](https://datatracker.ietf.org/doc/html/rfc8878). Multiple
independent implementations are already available.
This repository represents the reference implementation, provided as
an open-source dual [BSD](LICENSE) and [GPLv2](COPYING) licensed **C**
library,
and a command line utility producing and decoding `.zst`, `.gz`, `.xz`
and `.lz4` files.
Should your project require another programming language,
a list of known ports and bindings is provided on [Zstandard
homepage](http://www.zstd.net/#other-languages).

So it sounds like that is a spec you could use. Not sure exactly what
they mean by "stable", but given the .gz compatibility maybe it would
be worth considering. Its a lot faster than zlib. (The library I
support includes Snappy, Zlib, and Zstd, and the latter is faster and
better than the other two.)

Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help