Thread (12 messages) 12 messages, 1 author, 2025-10-06

[NOTES 08/11] Resumable fetch / push

From: Taylor Blau <hidden>
Date: 2025-10-06 19:20:14

Topic: Resumable fetch/push
Leader: Caleb (was Scott, but he's not here)

* Is this only client side or server side too?
	* Applies to both as GitButler has a forge too. Would be nice to have protocol
		improvements.
* Both bundle-uris and packfile-uris exist and at least packfile-uris are
	resumable. Both are fetch-only, so push is unsolved.
* Could use single-threaded output or server-side caching to make pushing work.
* Maybe make it so servers could receive a bundle and make that resumable.
* Use cases: Pushing a repo for the first time to a new server, once there's
	good large file support, android/chromium. Also a problem that's independent
	of size in environments with poor connectivity (some countries, Caltrain, …).
* Servers could hand out some kind of opaque data with the fetch to indicate
	what it has cached, clients can re-share that when attempting to resume and
	the server can choose to do something with it or not.
* GitHub support has told people to create a branch with N commits at a time to
	fetch.


Scrambly notes (Jack's notes):


* Specific Forge implementation, http based communication -> easier to set up,
	keen on improvement to protocol that allows large pack files sent between
	client and server
* For packfile uris at least the pack file part that is in the uri is already
	reasonable, for bundle url's may not be the same, might be low handing fruit
* Taylor: push side more interesting: server -> already sent you first m bytes
	of x, need something to send the resumable push
* Consider implications as an attack vector
* Brian: git's pack implementation is deterministic if you don't do
	multithreading, could use returnable mode like gzip has unsyncable mode, for
	client side pack a temporary file, this is resumable with an offset, and since
	pack is cached locally should be something you could resume with push. Some
	possibilities if we cache on the server side or use single threaded output
* an idea from pack file ui which could help solve fetch problem, server provide
	url to the client, let the server be the fetcher
* Emily: that would work pretty ok using a commit cloud server, already serving
	those objects. The server side can resume necessarily.
* Servers don't receive bundles, so would be adding support for server to
	receive bundles. What's the real use case for this? It's worth it's own
	protocol, not just a push protocol. When we try to mirror things in Gerrit it
	fails due to large number of refs - would need an enhancement to handle large
	numbers of refs.
* Caleb: So you suggest some sort of TCP protocol for handling these transfers?
* We have user stored binary and timeout uploading to server, it's not just
	migration path
* Having some way of guaranteeing forward progress on a push or a pull as long
	as you can get some smaller unit of data transfer, don't know how small to go,
	but would be very useful
* We talked about chunk format before, would introducing chunk format, small
	enough chunks help?
* If it's small enough and reproducible
* Elijah: Even if you have small chunks, if they are part of the same
	communication, if they're small enough you'll need to restart it
* If you have to resume now say you have sent X chunks then you have N - X left
* Peff: All you need to know is the byte offset.
* Elijah: Take the objects that you have received and say "I have these objects"
* What if you hash what you got, "I asked for this", the hash was this length,
	give me the rest
* Peff: Has to be able to regenerate everything from scratch, are you caching
	it? Kindof wasteful
* Doesn't need to be cached, just needs to be stable, so if there was a way to
	ask for it in a specific order
* Disable multithreading


* Peff: Looked into this with resumable clones, server can pass out some cache
	tag, here's an opaque tag that may or may not be valid in the future, I got X
	bytes of this tag can you send the rest. Becomes a heuristic on the server
	"I'll choose how much to cache", git doesn't need to know about that it's an
	implementation issue
* With a pack file uri you stop what you're doing talking to the server
* If you were trying to brute force it today, you would brute force sending a
	ref
* Peff: GitHub support has told people to do that
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help