Re: [silly] loose, pack, and another thing?
From: Jonathan Tan <hidden>
Date: 2023-09-28 21:40:17
Junio C Hamano [off-list ref] writes:
Just wondering if it would help to have the third kind of object representation in the object database, sitting next to loose objects and packed objects, say .git/objects/verbatim/<hex-object-name> for the contents and .git/objects/verbatim/<hex-object-name>.type that records "blob", "tree", "commit", or "tag" (in practice, I would expect huge "blob" objects would be the only ones that use this mechanism). The contents will be stored verbatim without compression and without any object header (i.e., the usual "<type> <length>\0") and the file could be "ln"ed (or "cow"ed if the underlying filesystem allows it) to materialize it in the working tree if needed.
This sounds like a useful feature. We probably would want to use the "ln" or "cow" every time we use streaming (stream_blob_to_fd() in streaming.h) currently, so hopefully we won't need to increase the number of ways in which we can write an object to the worktree (just change the streaming to write to a filename instead of an fd).
"fsck" needs to be told about how to verify them. Create the object header in-core and hash that, followed by the contents of that file, and make sure the result matches the <hex-object-name> part of the filename, or something like that.
Yeah, this sounds like what index-pack is doing - the hash algo can take the contents of one buffer (a header that we synthesize ourselves), and then take the contents of another buffer (the file contents).