Thread (14 messages) 14 messages, 1 author, 2023-10-02

[TOPIC 1/12] Next-gen reference backends

From: Taylor Blau <hidden>
Date: 2023-10-02 15:17:54

(Presenter: Patrick Steinhardt, Notetaker: Karthik Nayak)

* Summary: There have been multiple proposals for reference backends on the
  mailing list. Trying to converge to one solution.
* Problem: At GitLab we have certain repos with large amounts of references.
  Some repos have multi-million refs which causes scalability issues.
   * Current files backend uses a combination of loose files and packed-refs.
   * Deletion performance is bad.
   * Reference lookups are slow.
   * Storage space is also large.
   * There are some patches which improved the situation. e.g. skip-list for
     packed-refs by Taylor.
   * Atomic updates are currently not possible.
   * This is not an issue only faced by GitLab
* Two solutions proposed:
   * Reftables: Originally implemented by JGit (Shawn Pearce, 2017)
      * Google was storing the data in a table with one ref per row. This data
        was encrypted, which changes the ordering.
      * This led to realizing the ref storage itself was not optimal, so based
        on existing solutions at Google there was a proposal by Shawn and was
        implemented in JGit.
      * This solved the ref storage problem at Google.
      * The implementation in JGit by adoption was low because of compatibility
        requirement with CGit.
      * New patch series submitted which swaps out the packed-refs with
        ref-tables while keeping the existing file based loose-refs.
   * Incremental take on reference backend (aka. packed-refs v2) by Derrick
      * Uses pre-existing infrastructure in the git project. Makes it a more
        natural extension.
      * First part was to support a multi backend structure
      * Second part was packed references v2 in the Git project
* Question: How do we take it forward from here.
   * Emily: If the existing backend exists as a library. Might be easier to
     replace and experiment with.
      * Jeff: A lot of work in that direction has already been landed. But there
        is still some bleed of the implementation in other parts of the code.
        Might be messy to cleanup.
      * Patrick: Different implementations by different hosting providers with
        different requirements might cause issues for clients.[b]
   * Deletion performance is not the only issue faced (at GitLab) there are also
     deadlocks faced around this.
   * brian: If you have a large number of remote tracking refs you face the same
     perf issues.
   * Patrick: Any preference of which solution to go forward. GitLab is
     interested to pick this up and mostly going forward with reftables.
   * Reftables does support tombstoning, should solve the problem with multiple
     deletions.
      * There is still a problem with refs being a prefix of other refs.
   * Is there a world where loose refs are removed completely and replaced with
     reftables.
      * Debugging is much easier with loose refs, reftables is binary
        formatting. Might need additional tooling here. This is already proved
        to be working at Google.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help