Thread (46 messages) 46 messages, 10 authors, 2023-02-06

Re: [PATCH 0/9] git archive: use gzip again by default, document output stabilty

From: Phillip Wood <hidden>
Date: 2023-02-06 14:47:03

On 03/02/2023 13:49, Ævar Arnfjörð Bjarmason wrote:
On Thu, Feb 02 2023, Phillip Wood wrote: >> Reverting the change gives the misleading impression that we're making
quoted
a commitment to keeping the output stable.
I don't see how you can conclude that from this series. It explicitly
states that we make no such promises, what it does is go back to
allowing the gzip(1) command to make its own promises.
This series would not be happening if we were not reverting a change to 
the compressed output of 'git archive'. The documentation updates are 
very welcome but I think we're undermining the message that the 
compressed output can change by reverting that change.
quoted
The focus of this thread seems to be the
problems relating to github which they have already addressed.
Which they've addressed by reverting the change, but while they're a
major user of git they're not the only one. They just happened to use
"git archive".

I think it would be a mistake to conclude that everyone who's run into
this has already done so, or is aware of it.
I've spent some time trying to find reports of problems caused by this 
change and have not seen anything apart from the issue with GitHub. 
Although it takes a while for new versions of git to get into linux 
distributions if there is a widespread problem we normally hear about it 
pretty quickly. This change has been in two releases now. If anyone does 
have a problem there is an easy fix in the form of setting 
tar.<format>.command
quoted
I think there is general agreement that it is not practical to promise
that the compressed output of "git archive" is stable so maybe it is
better[...]
...better than what? This seems to imply that this series is making new
promises about the output stability, which it isn't doing.
It's better people realize they cannot rely on the output being stable 
now when they can safely work around the problem while working on a 
proper fix rather than waiting until the change in output is caused by a 
security issue in gzip which means the work around is no longer safe.

Best Wishes

Phillip
quoted
[...]to make that clear now while users can work around it in the
short term with a config setting rather than waiting until we're faced
with some security or other issue that forces a change to the output
which users cannot work around so easily.
I think it's always been clear that you can use that setting. For ages
we've been saying:

	The `tar.gz` and `tgz` formats are defined automatically and use the
	command `gzip -cn` by default.

Then v2.38.0 changed it to:

	[...]
         magic command `git archive gzip` by default

Which IMO was easily missed among other "Performance, Internal
Implementation, Development Support etc." items in the release notes,
which said:

    Teach "git archive" to (optionally and then by default) avoid
    spawning an external "gzip" process when creating ".tar.gz" (and
    ".tgz") archives.

But I agree that all of this is subjective. To me a 2% reduction in CPU
use (at the cost of ~20% increse in wallclock) & some unclear benefits
to teaching users that they can't rely on our "gzip" output seems
unclear or hypothetical.

Whereas the widespread breakage reported is very real,
where are the reports of widespread berakage outside of GitHub?
and we should
consider GitHub as a canary for that, not the the stand & end of its
potential impact.

As we didn't have a strong reason to change this in the first place (and
as my series shows, we can have our cake & eat it too if we don't have a
"gzip") I think the obvious choice is to go back to using "gzip".
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help