Thread (8 messages) 8 messages, 4 authors, 2022-08-26

Re: [PATCH] [RFC] list-objects-filter: introduce new filter sparse:buffer=<spec>

From: ZheNing Hu <hidden>
Date: 2022-08-12 15:49:24

Jeff King [off-list ref] 于2022年8月11日周四 05:15写道:
On Tue, Aug 09, 2022 at 09:37:09AM -0400, Derrick Stolee wrote:
quoted
quoted
Was the reason why we have "we limit to an object we already have"
restriction because we didn't want to blindly use a piece of
uncontrolled arbigrary end-user data here?  Just wondering.
One of the ideas here was to limit the opportunity of sending an
arbitrary set of data over the Git protocol and avoid exactly the
scenario you mention.
One other implication here is that the filter spec is sent inside of a
pkt-line.  So the implementation here is limiting us to 64kb. That may
sound like a lot for simple specs, but I imagine in big repos they can
possibly get pretty complex.

That would be fixable with a protocol extension to take the data over
multiple pkt-lines.
This sounds very scary, a filter rules file has (more then) 64KB...
If the filter is really big, I think the server will really be slow to parse it.
That said...
quoted
At this moment, I think path-scoped filters have a lot of problems
that need solving before they can be used effectively in the wild.
I would prefer that we solve those problems before making the
feature more complicated. That's a tall ask, since these problems
do not have simple solutions.
...I agree with this. It is nice to put more power in the hands of the
clients, but we have to balance that with other issues like server
resource use. The approach so far has been to implement the simplest and
most efficient operations at the client-server level, and then have the
client build local features on top of that. So in this case, probably
requesting that _no_ trees are sent in the initial clone, and then
faulting them in as the client explores the tree using its own local
sparse definition. And I think that mostly works now.
Agree. But we have to fetch these blobs one by one after partial clone,
why not reduce some extra network overhead If we can get those blobs
that are *most* needed in the first partial clone, right?
Though I admit I do not keep a close watch on the status of
partial-checkout features. I mostly always cared about it from the
server provider angle. ;)

-Peff
Thanks.

ZheNing Hu
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help