Thread (77 messages) 77 messages, 4 authors, 2022-04-22

Re: [PATCH 2/3] protocol v2: specify static seeding of clone/fetch via "bundle-uri"

From: Derrick Stolee <hidden>
Date: 2021-10-27 01:55:37

On 10/26/2021 11:00 AM, Ævar Arnfjörð Bjarmason wrote:
On Tue, Oct 26 2021, Derrick Stolee wrote:
quoted
The implementation seems simple enough, which I like. I'm a bit
I mentally inserted the missing "skeptical/uncertain" etc. here :)
More "uncertain" than "skeptical". The current plan works perfectly
for the current implementation, so there is an element of YAGNI that
could easily lead us to avoid overthinking this.
 
quoted
your current use of Git config as the back-end, only because it is
difficult to be future-proof. As the functionality stands today, the
current config design works just fine. Perhaps we don't need to
worry about the future, because we can design a new, complementary
storage for that extra data. It seems worth exploring for a little
while, though. Perhaps we should take a page out of 'git-remote'
and how it stores named remotes with sub-items for metadata. The
names probably don't need to ever be exposed to users, but it could
be beneficial to anyone implementing this scheme.

[bundle "main"]
	uri = https://example.com/my-bundle
	uri = https://redundant-cdn.com/my-bundle
	size = 120523
	sha256 = {64hexchars}

[bundle "fork"]
	uri = https://cdn.org/my-fork
	size = 334
	sha256 = {...}
	prereq = {oid}

This kind of layout has an immediate grouping of data that should
help any future plan. Notice how I included multiple "uri" lines
in the "main", which helps with your plan for duplicate URIs.
At first sight I like that config schema much better than my current
one, in particular how it makes the future-proofed "these N urls are one
logical URL" case simpler.

But overall I'm a bit on the fence, and leaning towards keeping what I
have, not out of any lazynes or wanting to just keep what I have mind
you.

But rather that the main benefit of the current one is that it's a 1=1
mapping to the line-based protocol, and you can say update your URLs as:

    git config --replace-all uploadpack.bundleUri "$first_url" &&
    git config --add uploadpack.bundleUri "$second_url"

Having usually you'd know the URL you'd like to replace, so you can use
the [value-pattern] of --replace-all, if it's a named section or other
split-out structure that become a two-step lookup.
Don't forget to use --fixed-value for exact string matching instead of
regex matching!
Also for testing I've got a (trivial) plumbing tool I'll submit called
"git ls-remote-bundle-uri" (could be folded into something else I guess)
to dump the server-side config, it's nice that you can pretty much
directly copy/paste it into config without needing to adjust it.
With the appropriate helper structs and methods in the product code,
such helper tools will still be simple without being a second place
that is directly aware of how the values are stored to disk. I don't
judge your prototype work that helps you build the feature, but it's
simultaneously not a reason to stick to a design.
Having said all that I'm not sure I feel strongly about it either way,
what do you think given the above?
I'm not feeling too strong about it right now. The current design
does not need anything extra, but it also purposefully leaves certain
things open for extension in the future.

The thing I worry about is that there will be two supported ways to
store a list of bundle URIs: a flat list of URIs in the multi-valued
uploadPack.bundleURI config value, but then also a second option that
allows the extensions that arise. It's a layer of complication that
would be nice to avoid if there was an easy way to do it, but the
schema I sketched earlier isn't simple enough to merit a switch right
now. Perhaps someone else will have an idea that accomplishes the
same goal, but also is less complicated?
 
I think most "real" server operators will use this as
GIT_CONFIG_COUNT=<n> GIT_CONFIG_{KEY,VALUE}_<1..n>, which can of course
work with any config schema, but if you've got code generating it on the
other side naming the targets is probably a slight hassle / confusing.
You'd really overload the environment this way? That's not how I would
approach it, but maybe there is a benefit over writing to the repository's
config file. I suppose that you could store the data in a database and
link it to the repository at runtime instead.
There's also the small matter of it being consistent with the
packfile-uri config in its current form, but that shouldn't be a reason
not to come up with something better. If anything any better suggestion
(if we go for that) could be supported by it too...
What do you mean about being consistent with packfile-uri? This layer
that we care about isn't even implemented in git.git.

Thanks,
-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help