Re: [PATCH v2 0/4] [RFC] repack: add --filter=
From: Taylor Blau <hidden>
Date: 2022-02-22 17:33:09
Hi Christian, I think my objections may be based on a misunderstanding of John and your original proposal. From reading [1], it seemed to me like a required step of this proposal was to upload the objects you want to filter out ahead of time, and then run `git repack -ad --filter=...`. So my concerns thus far have been around the lack of cohesion between (1) the filter which describes the set of objects uploaded to the HTTP server, and (2) the filter used when re-filtering the repository. If (1) and (2) aren't inverses of each other, then in the case where (2) leaves behind an object which wasn't caught by (1), we have lost that object. If instead the server used in your script at [1] is a stand-in for an ordinary Git remote, that changes my thinking significantly. See below for more details: On Tue, Feb 22, 2022 at 06:11:11PM +0100, Christian Couder wrote:
quoted
quoted
quoted
I share the same concern as Robert and Stolee do. But I think this issue goes deeper than just naming. Even if we called this `git repack --delete-filter` and only ran it with `--i-know-what-im-doing` flag, we would still be leaving repository corruption on the table, just making it marginally more difficult to achieve.My opinion on this is that the promisor object mechanism assumes by design that some objects are outside a repo, and that this repo shouldn't care much about these objects possibly being corrupted.For what it's worth, I am fine having a mode of repack which allows us to remove objects that we know are stored by a promisor remote. But this series doesn't do that, so users could easily run `git repack -d --filter=...` and find that they have irrecoverably corrupted their repository.In some cases we just know the objects we are removing are stored by a promisor remote or are replicated on different physical machines or both, so you should be fine with this.
I am definitely OK with a convenient way to re-filter your repository
locally so long as you know that the objects you are filtering out are
available via some promisor remote.
But perhaps I have misunderstood what this proposal is for. Reading
through John's original cover letter and the link to your demo script, I
understood that a key part of this was being able to upload the pack of
objects you were about to filter out of your local copy to some server
(not necessarily Git) over HTTP.
My hesitation so far has been based on that understanding. Reading these
patches, I don't see a mechanism to upload objects we're about to
expunge to a promisor remote.
But perhaps I'm misunderstanding: if you are instead assuming that the
existing set of remotes can serve any objects that we deleted, and this
is the way to delete them, then I am OK with that approach. But I think
either way, I am missing some details in the original proposal that
would have perhaps made it easier for me to understand what your goals
are.
In any case, this patch series doesn't seem to correctly set up a
promisor remote for me, since doing the following on a fresh clone of
git.git (after running "make"):
$ bin-wrappers/git repack -ad --filter=blob:none
$ bin-wrappers/git fsck
results in many "missing blob" and "missing link" lines out of output.
(FWIW, I think what's missing here is correctly setting up the affected
remote(s) as promisors and indicating that we're now in a filtered
setting when going from a full clone down to a partial one.)
If you are not fine with this because sometimes a user might use it without knowing, then why are you ok with commands deleting refs not checking that there isn't a regular repack removing dangling objects?
I'm not sure I totally understand your question, but my general sense has been "because we typically make it difficult / impossible to remove reachable objects". Thanks, Taylor [1]: https://gitlab.com/chriscool/partial-clone-demo/-/blob/master/http-promisor/server_demo.txt#L47-52