[NOTES 08/11] Resumable fetch / push
From: Taylor Blau <hidden>
Date: 2025-10-06 19:20:14
Topic: Resumable fetch/push Leader: Caleb (was Scott, but he's not here) * Is this only client side or server side too? * Applies to both as GitButler has a forge too. Would be nice to have protocol improvements. * Both bundle-uris and packfile-uris exist and at least packfile-uris are resumable. Both are fetch-only, so push is unsolved. * Could use single-threaded output or server-side caching to make pushing work. * Maybe make it so servers could receive a bundle and make that resumable. * Use cases: Pushing a repo for the first time to a new server, once there's good large file support, android/chromium. Also a problem that's independent of size in environments with poor connectivity (some countries, Caltrain, …). * Servers could hand out some kind of opaque data with the fetch to indicate what it has cached, clients can re-share that when attempting to resume and the server can choose to do something with it or not. * GitHub support has told people to create a branch with N commits at a time to fetch. Scrambly notes (Jack's notes): * Specific Forge implementation, http based communication -> easier to set up, keen on improvement to protocol that allows large pack files sent between client and server * For packfile uris at least the pack file part that is in the uri is already reasonable, for bundle url's may not be the same, might be low handing fruit * Taylor: push side more interesting: server -> already sent you first m bytes of x, need something to send the resumable push * Consider implications as an attack vector * Brian: git's pack implementation is deterministic if you don't do multithreading, could use returnable mode like gzip has unsyncable mode, for client side pack a temporary file, this is resumable with an offset, and since pack is cached locally should be something you could resume with push. Some possibilities if we cache on the server side or use single threaded output * an idea from pack file ui which could help solve fetch problem, server provide url to the client, let the server be the fetcher * Emily: that would work pretty ok using a commit cloud server, already serving those objects. The server side can resume necessarily. * Servers don't receive bundles, so would be adding support for server to receive bundles. What's the real use case for this? It's worth it's own protocol, not just a push protocol. When we try to mirror things in Gerrit it fails due to large number of refs - would need an enhancement to handle large numbers of refs. * Caleb: So you suggest some sort of TCP protocol for handling these transfers? * We have user stored binary and timeout uploading to server, it's not just migration path * Having some way of guaranteeing forward progress on a push or a pull as long as you can get some smaller unit of data transfer, don't know how small to go, but would be very useful * We talked about chunk format before, would introducing chunk format, small enough chunks help? * If it's small enough and reproducible * Elijah: Even if you have small chunks, if they are part of the same communication, if they're small enough you'll need to restart it * If you have to resume now say you have sent X chunks then you have N - X left * Peff: All you need to know is the byte offset. * Elijah: Take the objects that you have received and say "I have these objects" * What if you hash what you got, "I asked for this", the hash was this length, give me the rest * Peff: Has to be able to regenerate everything from scratch, are you caching it? Kindof wasteful * Doesn't need to be cached, just needs to be stable, so if there was a way to ask for it in a specific order * Disable multithreading * Peff: Looked into this with resumable clones, server can pass out some cache tag, here's an opaque tag that may or may not be valid in the future, I got X bytes of this tag can you send the rest. Becomes a heuristic on the server "I'll choose how much to cache", git doesn't need to know about that it's an implementation issue * With a pack file uri you stop what you're doing talking to the server * If you were trying to brute force it today, you would brute force sending a ref * Peff: GitHub support has told people to do that