Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format

[PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 01/30] hashfile: allow skipping the hash function · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 02/30] read-cache: add index.computeHash config option · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
Re: [PATCH 02/30] read-cache: add index.computeHash config option · Elijah Newren <hidden> · 2022-11-11
Re: [PATCH 02/30] read-cache: add index.computeHash config option · Derrick Stolee <hidden> · 2022-11-14
Re: [PATCH 02/30] read-cache: add index.computeHash config option · Ævar Arnfjörð Bjarmason <hidden> · 2022-11-17
[PATCH 03/30] extensions: add refFormat extension · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
Re: [PATCH 03/30] extensions: add refFormat extension · Elijah Newren <hidden> · 2022-11-11
Re: [PATCH 03/30] extensions: add refFormat extension · Derrick Stolee <hidden> · 2022-11-16
[PATCH 06/30] refs: allow loose files without packed-refs · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 07/30] chunk-format: number of chunks is optional · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 04/30] config: fix multi-level bulleted list · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 05/30] repository: wire ref extensions to ref backends · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 08/30] chunk-format: document trailing table of contents · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 09/30] chunk-format: store chunk offset during write · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 11/30] chunk-format: parse trailing table of contents · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 10/30] chunk-format: allow trailing table of contents · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 13/30] packed-backend: extract add_write_error() · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 12/30] refs: extract packfile format to new file · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 14/30] packed-backend: extract iterator/updates merge · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 16/30] config: add config values for packed-refs v2 · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 15/30] packed-backend: create abstraction for writing refs · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 17/30] packed-backend: create shell of v2 writes · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 18/30] packed-refs: write file format version 2 · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 19/30] packed-refs: read file format v2 · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 20/30] packed-refs: read optional prefix chunks · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 21/30] packed-refs: write prefix chunks · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 22/30] packed-backend: create GIT_TEST_PACKED_REFS_VERSION · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 24/30] t5312: allow packed-refs v2 format · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 23/30] t1409: test with packed-refs v2 · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 26/30] t3210: require packed-refs v1 for some tests · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 25/30] t5502: add PACKED_REFS_V1 prerequisite · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 27/30] t*: skip packed-refs v2 over http tests · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 28/30] ci: run GIT_TEST_PACKED_REFS_VERSION=2 in some builds · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 29/30] p1401: create performance test for ref operations · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
[PATCH 30/30] refs: skip hashing when writing packed-refs v2 · Derrick Stolee via GitGitGadget <hidden> · 2022-11-07
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee <hidden> · 2022-11-09
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Elijah Newren <hidden> · 2022-11-11
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee <hidden> · 2022-11-14
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Elijah Newren <hidden> · 2022-11-15
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee <hidden> · 2022-11-16
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Elijah Newren <hidden> · 2022-11-17
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Han-Wen Nienhuys <hidden> · 2022-11-28
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee <hidden> · 2022-11-30
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Phillip Wood <hidden> · 2022-11-30
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Taylor Blau <hidden> · 2022-11-30
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Han-Wen Nienhuys <hidden> · 2022-11-30
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Sean Allred <hidden> · 2022-11-30
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Derrick Stolee <hidden> · 2022-12-01
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Han-Wen Nienhuys <hidden> · 2022-12-02
Re: [PATCH 00/30] [RFC] extensions.refFormat and packed-refs v2 file format · Ævar Arnfjörð Bjarmason <hidden> · 2022-12-02

From: Elijah Newren <hidden>
Date: 2022-11-15 02:48:06

On Sun, Nov 13, 2022 at 4:07 PM Derrick Stolee [off-list ref] wrote:

On 11/11/22 6:28 PM, Elijah Newren wrote:

quoted

On Mon, Nov 7, 2022 at 11:01 AM Derrick Stolee via GitGitGadget
[off-list ref] wrote:

quoted

Introduction
============

I became interested in our packed-ref format based on the asymmetry between
ref updates and ref deletions: if we delete a packed ref, then the
packed-refs file needs to be rewritten. Compared to writing a loose ref,
this is an O(N) cost instead of O(1).

In this way, I set out with some goals:

 * (Primary) Make packed ref deletions be nearly as fast as loose ref
   updates.

Performance is always nice.  :-)

quoted

 * (Secondary) Allow using a packed ref format for all refs, dropping loose
   refs and creating a clear way to snapshot all refs at a given point in
   time.

Is this secondary goal the actual goal you have, or just the
implementation by which you get the real underlying goal?

To me, the primary goal takes precedence. It turns out that the best
way to solve for that goal happens to also make it possible to store
all refs in a packed form, because we can update the packed form
much faster than our current setup. There are alternatives that I
considered (and prototyped) that were more specific to the deletions
case, but they were not actually as fast as the stacked method. Those
alternatives also would never help reach the secondary goal, but I
probably would have considered them anyway if they were faster, if
only for their simplicity.

That's orthogonal to my question, though.  For your primary goal, you
stated it in a form where it was obvious what benefit it would provide
to end users.  Your secondary goal, as stated, didn't list any benefit
to end users that I could see (update: reading the rest of your
response it appears I just didn't understand it), so I was trying to
guess at why your secondary goal might be a goal, i.e. what the real
secondary goal was.

quoted

To me, it appears that such a capability would solve both (a) D/F
conflict problems (i.e. the ability to simultaneously have a
refs/heads/feature and refs/heads/feature/shiny ref), and (b) case
sensitivity issues in refnames (i.e. inability of some users to work
with both a refs/heads/feature and a refs/heads/FeAtUrE, due to
constraints of their filesystem and the loose storage mechanism).  Are
either of those the goal you are trying to achieve (I think both would
be really nice, more so than the performance goal you have), or is
there another?

For a Git host provider, these D/F conflict and case-sensitivity
situations probably would need to stay as restrictions on the
server side for quite some time because we don't want users on
older Git clients to be unable to fetch a repository just because
we updated our ref storage to allow for such possibilities.

Okay, but even if not used on the server side, this capability could
still be used on the client side and provide a big benefit to end
users.

But I think there's a minor issue with what you stated; as far as I
can tell, there is no case-sensitivity restriction on the server side
for GitHub currently, and users do currently have problems cloning and
using repositories with branches that differ in case only.  See e.g.
https://github.com/newren/git-filter-repo/issues/48 and the multiple
duplicates which reference that issue.  We've also had issues at
$DAYJOB, though for GHE we added some hooks to deny creating branches
that differ only in case from another branch to avoid the problem.

Also, D/F restrictions on the server do not stop users from having D/F
problems when fetching.  If users forget to use `--prune`, then when a
refs/heads/foo has already been fetched is deleted and replaced by a
refs/heads/foo/bar, then the user gets errors.  This issue actually
caused a bit of a fire-drill for us just recently.

So both kinds of problems already exist, for users with any git client
version (although the former only for users with unfortunate file
systems).  And both problems cause pain.  Both issues are caused by
loose refs, so limiting git storage to packed refs would fix both
issues.

The biggest benefit on the server side is actually for consistency
checks. Using a stacked packed-refs (especially with a tip file
that describes all of the layers) allows an atomic way to take a
snapshot of the refs and run a checksum operation on their values.
With loose refs, concurrent updates can modify the checksum during
its computation. This is a super niche reason for this, but it's
nice that the performance-only focus also ends up with a design
that satisfies this goal.

Ah...so this is the reason for your secondary goal?  Re-reading it
looks like you did state this, I just missed it without the longer
explanation.

Anyway, it might be worth calling out in your cover letter that there
are (at least) three benefits to this secondary goal of yours -- the
one you list here, plus the two I list above.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help