Re: [PATCH v2 0/4] [RFC] repack: add --filter=
From: John Cai <hidden>
Date: 2022-02-16 21:07:23
Hi Rob, glad these two efforts dovetail nicely! On 16 Feb 2022, at 10:39, Robert Coup wrote:
Hi John, On Wed, 9 Feb 2022 at 02:41, John Cai via GitGitGadget [off-list ref] wrote:quoted
This patch series makes partial clone more useful by making it possible to run repack to remove objects from a repository (replacing it with promisor objects). This is useful when we want to offload large blobs from a git server onto another git server, or even use an http server through a remote helper. In [A], a --refilter option on fetch and fetch-pack is being discussed where either a less restrictive or more restrictive filter can be used. In the more restrictive case, the objects that already exist will not be deleted. But, one can imagine that users might want the ability to delete objects when they apply a more restrictive filter in order to save space, and this patch series would also allow that.This all makes sense to me, and the implementation is remarkably short - gluing together capabilities that are already there, and writing tests. *But*, running `repack --filter` drops objects from the object db. That seems like a capability Git shouldn't idly expose without people understanding the consequences - mostly that they really have another copy elsewhere or they will lose data, and it won't necessarily be obvious for a long time. Otherwise it is a footgun.
Yes, great point. I think there was concern from Stolee around this as well.
I don't know whether that is just around naming (--delete-filter / --drop-filter / --expire-filter ?), and/or making the documentation very explicit that this isn't so much "omitting certain objects from a packfile" as irretrievably deleting objects.
Yeah, making the name very clear (I kind of like --delete-filter) would certainly help. Also, to have more protection we can either 1. add a config value that needs to be set to true for repack to remove objects (repack.allowDestroyFilter). 2. --filter is dry-run by default and prints out objects that would have been removed, and it has to be combined with another flag --destroy in order for it to actually remove objects from the odb.
Rob :)