Re: [PATCH v2 5/5] Reftable support for git-core
From: Jeff King <hidden>
Date: 2020-01-29 10:47:57
On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote:
JGit currently implements what we have here, as this is what's spelled out in the spec that Shawn posted back in the day. It's probably acceptable to this, though, as the reftable support has only landed in JGit very recently and will probably appear very experimental to folks. How would the layout be then? We'll have HEAD - dummy file reftable/ - the tables refs/ - dummy dir where shall we store the reftable list? maybe in a file called reftable-list If we have both HEAD/refs + (refable/reftable-list), what should we put there to ensure that no git version actually manages to use the repository? (what happens if someone deletes the version setting from the .git/config file)
Yeah, it would be nice to have something that an older version of Git
would totally choke on, but I'm not sure we have a lot of leeway. What
we put in HEAD has to be syntactically legitimate enough to appease
validate_headref(), so our options are either "ref:
refs/something/bogus" or an object hash that we don't have (e.g.,
0{40}). The former would be preferable because it would (in theory)
prevent us from writing to HEAD, as well.
I wondered what would happen if you put in a syntactically invalid ref,
like "ref: refs/.not/.valid" (leading dots are not allowed in path
components of refnames). It does cause _some_ parts of Git to choke, but
sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid.
Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're
pretty willing to accept symref destinations without further checking.
Making "refs" a file instead of a directory does work nicely, as any
attempts to read or write would get ENOTDIR. And we can fool
is_git_directory() as long as it's marked executable. That's OK on POSIX
systems, but I'm not sure how it would work on Windows (or maybe it
would work just fine, since we presumably just say "yep, everything is
executable").
So perhaps that's enough, and what we put in HEAD won't matter (since
nobody will be able to write into refs/ anyway).
quoted
But that raises a question: how ready are reftables to handle non-sha1 object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk format actually has binary sha1s, right? In theory if those all become the_hash_algo->rawsz, then it might "Just Work" to read and write slightly larger entries.The format fixes the reftable at 20 bytes, and there is not enough framing information to just write more data. We'll have to encode the hash size in the version number somehow, eg. we could use the higher order bit of the version byte to encode it, for example. But it needs a new version of the spec. I think it's premature to do this while v1 of reftable isn't in git-core yet.
I don't know that we technically need the reftables file to say how long the hashes are. The git config will tell us which hash we're using, and everything else is supposed to follow. So I think it would work OK as long as you're able to be told by the rest of Git that hashes are N bytes, and just use that to compute the fixed-size records. That said, it might make for easier debugging if the reftables file declares the size it assumes. -Peff