[TOPIC 1/12] Next-gen reference backends
From: Taylor Blau <hidden>
Date: 2023-10-02 15:17:54
(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)
* Summary: There have been multiple proposals for reference backends on the
mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
Some repos have multi-million refs which causes scalability issues.
* Current files backend uses a combination of loose files and packed-refs.
* Deletion performance is bad.
* Reference lookups are slow.
* Storage space is also large.
* There are some patches which improved the situation. e.g. skip-list for
packed-refs by Taylor.
* Atomic updates are currently not possible.
* This is not an issue only faced by GitLab
* Two solutions proposed:
* Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
* Google was storing the data in a table with one ref per row. This data
was encrypted, which changes the ordering.
* This led to realizing the ref storage itself was not optimal, so based
on existing solutions at Google there was a proposal by Shawn and was
implemented in JGit.
* This solved the ref storage problem at Google.
* The implementation in JGit by adoption was low because of compatibility
requirement with CGit.
* New patch series submitted which swaps out the packed-refs with
ref-tables while keeping the existing file based loose-refs.
* Incremental take on reference backend (aka. packed-refs v2) by Derrick
* Uses pre-existing infrastructure in the git project. Makes it a more
natural extension.
* First part was to support a multi backend structure
* Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
* Emily: If the existing backend exists as a library. Might be easier to
replace and experiment with.
* Jeff: A lot of work in that direction has already been landed. But there
is still some bleed of the implementation in other parts of the code.
Might be messy to cleanup.
* Patrick: Different implementations by different hosting providers with
different requirements might cause issues for clients.[b]
* Deletion performance is not the only issue faced (at GitLab) there are also
deadlocks faced around this.
* brian: If you have a large number of remote tracking refs you face the same
perf issues.
* Patrick: Any preference of which solution to go forward. GitLab is
interested to pick this up and mostly going forward with reftables.
* Reftables does support tombstoning, should solve the problem with multiple
deletions.
* There is still a problem with refs being a prefix of other refs.
* Is there a world where loose refs are removed completely and replaced with
reftables.
* Debugging is much easier with loose refs, reftables is binary
formatting. Might need additional tooling here. This is already proved
to be working at Google.