Thread (64 messages) 64 messages, 18 authors, 2020-05-16

[TOPIC 3/17] Obliterate

From: James Ramsay <hidden>
Date: 2020-03-12 03:57:35

1. Jonathan N: sometimes people accidentally add a big file they don’t 
need. Have to use BFG and it’s a pain. Next time, maybe you just deal 
with it and ignore. This happened to Chrome. Some huge blob that was in 
the repo, should no longer be in the repo, but don’t want to rewrite 
the history. Other use cases are confidential information, like 
password, credit card number etc. Initial reactions: it’s already out 
there, rotate. Second reaction: if it’s a toxic blob it needs to be 
removed everywhere! What if someone taught kernel repo to

2. James: I’ve been in a lot of meetings with customers where they 
mentioned it’s not possible to rotate the information that was leaked 
into the repo

3. Demetr: How far back do we allow to go to obliterate?

4. Jonathan N: there are indeed horrible real-world examples where 
things to be obliterated are from a long time ago.

5. James: real cost to changing object ids: Git and tools interacting 
with it really assume that history is immutable.

6. Elijah: replace refs helps, but not supported by hosts like GitHub 
etc

     a. Stolee: breaks commit graph because of generation numbers.
     b. Replace refs for blobs, then special packfile, there were edge 
cases.

7. Demetr: Backward compatibility, wouldn’t custom handling be 
problematic for old clients.

8. Jeff H: can we introduce a new type of object -- a "revoked blob" if 
you will that burns the original one but also holds the original SHA in 
the ODB ??

9. Peff: what would this mean for signatures? New opportunity to forge 
signatures.

10. Jonathan N: if a new entity, this means you’ve changed the content 
which we want to avoid. Maybe a list of revoked blobs. If fsck notices 
missing, it should be happy. Protocol support, if someone tries to 
include a patch with it, just ignore it. Not great. Improvement would be 
to send a list of things I deliberately didn’t send. Could also 
communicate blobs to be deleted, but ignore that for v1. Learn from 
Mercurial who have a very complicated signed revocation mechanism.

11. Brian: the remote can’t be trusted, ala leftpad maintainer could 
do something malicious causing repo to become invalid.

12. Jonathan N: main scenario I’m considering is trusted company 
remote.

13. Terry: partial clone and solve large files. Maybe the server could 
handle it by converting normal clone into partial, and then handle the 
error if someone asks for that blob.

14. Jakub: one idea would simply be to treat this as a missing blob in a 
partial clone

15. Michael Haggerty: does this only apply to blobs? (Peff: no, commit 
messages can contain sensitive information; Johannes: trees contain file 
names which also can contain sensitive information)

16. Jonathan N: partial clone is not a solution for the desire to get 
rid of the blob on the server side.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help