Thread (77 messages) 77 messages, 4 authors, 2022-04-22

Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"

From: Derrick Stolee <hidden>
Date: 2021-10-27 19:23:27

On 10/27/2021 2:01 PM, Ævar Arnfjörð Bjarmason wrote:
On Wed, Oct 27 2021, Derrick Stolee wrote:
quoted
On 10/27/2021 4:29 AM, Ævar Arnfjörð Bjarmason wrote:
quoted
On Tue, Oct 26 2021, Derrick Stolee wrote:
quoted
On 10/25/2021 5:25 PM, Ævar Arnfjörð Bjarmason wrote:
quoted
Add a server-side implementation of a new "bundle-uri" command to
protocol v2. As discussed in the updated "protocol-v2.txt" this will
allow conforming clients to optionally seed their initial clones or
incremental fetches from URLs containing "*.bundle" files created with
"git bundle create".
...
quoted
+DISCUSSION of bundle-uri
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The intent of the feature is optimize for server resource consumption
+in the common case by changing the common case of fetching a very
+large PACK during linkgit:git-clone[1] into a smaller incremental
+fetch.
+
+It also allows servers to achieve better caching in combination with
+an `uploadpack.packObjectsHook` (see linkgit:git-config[1]).
+
+By having new clones or fetches be a more predictable and common
+negotiation against the tips of recently produces *.bundle file(s).
+Servers might even pre-generate the results of such negotiations for
+the `uploadpack.packObjectsHook` as new pushes come in.
+
+I.e. the server would anticipate that fresh clones will download a
+known bundle, followed by catching up to the current state of the
+repository using ref tips found in that bundle (or bundles).
+
+PROTOCOL for bundle-uri
+^^^^^^^^^^^^^^^^^^^^^^^
+
+A `bundle-uri` request takes no arguments, and as noted above does not
+currently advertise a capability value. Both may be added in the
+future.
One thing I realized was missing from this proposal is any interaction
with partial clone. It would be disappointing if we could not advertise
bundles of commit-and-tree packfiles for blobless partial clones.

There is currently no way for the client to signal the filter type
during this command. Not having any way to extend to include that
seems like an oversight we should remedy before committing to a
protocol that can't be extended.

(This also seems like a good enough reason to group the URIs into a
struct-like storage, because the filter type could be stored next to
the URI.)
I'll update the docs to note that. I'd definitely like to leave out any
implementation of filter/shallow for an initial iteration of this for
simplicity, but the protocol keyword/behavior is open-ended enough to
permit any extension.
It would be good to be explicit about how this would work. Looking at
it fresh, it seems that the server could send multiple bundle URIs with
the extra metadata to say which ones have a filter (and what that filter
is). The client could then check if a bundle matches the given filter.

But this is a bit inverted: the filter mechanism currently has the client
request a given filter and the server responds with _at least_ that much
data. This allows the server to ignore things like pathspec-filters or
certain size-based filters. If the client just ignores a bundle URI
because it doesn't match the exact filter, this could lead the client to
ask for the data without a bundle, even if it would be faster to just
download the advertised bundle.

For this reason, I think it would be valuable for the client to tell
the server the intended filter, and the server responds with bundle
URIs that contain a subset of the information that would be provided
by a later fetch request with that filter.
quoted
I.e. the server can start advertising "bundle-uri=shallow", and future
clients can request arbitrary key-value pairs in addition to just
"bundle-uri" now.

Having said that I think that *probably* this is something that'll never
be implemented, but maybe I'll eat my words there.
I didn't mean to elide past "filter", but was just using "shallow" as a
short-hand for one thing in the "fetch" dialog that a client can mention
that'll impact PACK generation, just like filter.

Having thought about this a bit more, I think it should be an invariant
in any bundle-uri design that the server shouldn't communicate any
side-channel information whatsoever about a bundle it advertises, if
that information can't be discovered in the header of that bundle file.

Mind you, this means throwing out large parts of my current proposed
over-the-wire design, but I think for the better. I.e. the whole
response part where we communicate:

    (bundle-uri (SP bundle-feature-key (=bundle-feature-val)?)* LF)*
    flush-pkt

Would just become something like:

    (bundle-uri delim-pkt bundle-header? delim-pkt)*
    flush-pkt

I.e. we'd optionally transfer the content of the bundle header (content
up to the first "\n\n") to the client, but *only* ever as a shorthand
for saving the client a roundtrip.
It still seems like we're better off letting the client request a
filter and have the server present URIs that the client can use,
and the server can choose to ignore the filter or provide URIs that
are specific to that filter. Sending the full list and making the
client decide what it wants seems like it will be more complicated
than necessary.

However, I'll withhold complete judgement until I see a full proposal
in a v2.
I think the sweet spot for "bundle-uri" is to advertise a small number
of bundles that encompass common clone/fetch patterns. I.e. something
like a bundle for a full clone with the repo data up to today, and maybe
a couple of other bundles covering a time period that clients would be
likely to incrementally update within, e.g. 1 week ago..today &&
yesterday..now.

I agree that adding say "full clone, --depth=1" and "full clone, no
blobs" etc. to that might be *very* useful for some deployments, but per
the above I think we should really add that to the bundle format first,
not protocol v2.
 
I'm focusing my interest in this topic not on "how can we make what we
already do faster?" but "how can we unlock scale not previously
possible?" Allowing blobless clones is an important part of this, in
my opinion, so it is my _default_ mode of operating.

Thanks,
-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help