Re: Stability of git-archive, breaking (?) the Github universe, and a... | git

Stability of git-archive, breaking (?) the Github universe, and a possible solution · Eli Schwartz <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Eli Schwartz <hidden> · 2023-01-31
[PATCH 0/9] git archive: use gzip again by default, document output stabilty · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 1/9] archive & tar config docs: de-duplicate configuration section · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 2/9] git config docs: document "tar.<format>.{command,remote}" · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 3/9] archiver API: make the "flags" in "struct archiver" an enum · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 4/9] archive: omit the shell for built-in "command" filters · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 5/9] archive-tar.c: move internal gzip implementation to a function · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 6/9] archive: use "gzip -cn" for stability, not "git archive gzip" · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 7/9] test-lib.sh: add a lazy GZIP prerequisite · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 8/9] archive tests: test for "gzip -cn" and "git archive gzip" stability · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
[PATCH 9/9] git archive docs: document output non-stability · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
Re: [PATCH 9/9] git archive docs: document output non-stability · brian m. carlson <hidden> · 2023-02-02
Re: [PATCH 9/9] git archive docs: document output non-stability · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty · Phillip Wood <hidden> · 2023-02-02
Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty · Raymond E. Pasco <hidden> · 2023-02-02
[PATCH] archive: document output stability concerns · Raymond E. Pasco <hidden> · 2023-02-03
Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-03
Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty · Phillip Wood <hidden> · 2023-02-06
Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty · "Theodore Ts'o" <tytso@mit.edu> · 2023-02-03
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · brian m. carlson <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Konstantin Ryabitsev <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · brian m. carlson <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · demerphq <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Michal Suchánek <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · demerphq <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · demerphq <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · "Theodore Ts'o" <tytso@mit.edu> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Joey Hess <hidden> · 2023-02-02
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · "Theodore Ts'o" <tytso@mit.edu> · 2023-02-03
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-03
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Raymond E. Pasco <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · brian m. carlson <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-02
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Eli Schwartz <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Konstantin Ryabitsev <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Eli Schwartz <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Konstantin Ryabitsev <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Michal Suchánek <hidden> · 2023-01-31
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · brian m. carlson <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · Ævar Arnfjörð Bjarmason <hidden> · 2023-02-01
Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution · brian m. carlson <hidden> · 2023-02-01

Re: Stability of git-archive, breaking (?) the Github universe, and a possible solution

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2023-01-31 11:54:08

On Tue, Jan 31 2023, brian m. carlson wrote:

Part of the reason I think this is valuable is that once SHA-1 and
SHA-256 interoperability is present, git archive will change the
contents of the archive format, since it will embed a SHA-256 hash into
the file instead of a SHA-1 hash, since that's what's in the repository.
Thus, we can't produce an archive that's deterministic in the face of
SHA-1/SHA-256 interoperability concerns, and we need to create a new
format that doesn't contain that data embedded in it.

I don't see why a format change would be required in this context.

If a repository were to switch over to SHA-256 wouldn't a better
solution to this be to disambiguate whether you're requesting a SHA-1 or
SHA-256 derived archive in the URL? E.g. to never serve up an archive
with a SHA-256 embedded in the header at:

	https://github.com/git/git/archive/refs/tags/v2.39.1.tar.gz

But require a URL like:

	https://github.com/git/git/archive-sha256/refs/tags/v2.39.1.tar.gz

If you did that then existing archives would continue to have the same
byte-for-byte content (assuming that the result of this discussion is
that we support that forever), but they'd always be generated with "-c
extensions.objectFormat=sha1". For always-SHA256 repos such a URL would
fail to generate anything.

But for repos that used to be SHA-1 but are now SHA-256 either URL would
work, but the PAX header would be different, referring to the SHA-1 or
SHA-256 commit, respectively.

Whereas your proposal seems to be that we should omit that SHA-(1|256)
from the "comment" entirely. That would seem to require either a one-off
change of all existing archives, or some cut-off date (or other marker).

If you've got a cut-off, you could also just use it to decide whether to
generate a SHA-1 or SHA-256 archive, and without that you'd be back to
the one-off breakage.

I also find it very useful that we've got the commit OID in the archive,
as it allows for round-tripping from archives back to the relevant
repository commit. Losing that entirely for SHA-1<->SHA-256 interop
would be unfortunate, especially if it turns out we could have easily
kept it

Having said that, I don't think this should be based on the timestamp of
the file, since that means that two otherwise identical archives
differing in timestamp aren't ever going to be the same, and we do see
people who import or vendor other projects.

Yes, I agree that doing this by that sort of heuristic would be bad.

Nor do I think we should
attempt to provide consistent compression, since I believe the output of
things like zlib has changed in the past, and we can't continually carry
an old, potentially insecure version of zlib just because the output
changed.  People should be able to implement compression using gzip,
zlib, pigz, miniz_oxide, or whatever if they want, since people
implement Git in many different languages, and we won't want to force
people using memory-safe languages like Go and Rust to explicitly use
zlib for archives.

As I noted in the side-thread I think an acceptable solution would be to
push the problem of the consistent compressor downstream. I.e. if a site
like GitHub wants to maintain a potentially old version of GNU gzip that
should be up to them.

But I think it's a valid concern that we should guarantee the stability
of the archive format.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help