[NOTES 01/11] SHA-256 and interoperability work

From: Taylor Blau <hidden>
Date: 2025-10-06 19:18:05
Topic: SHA256 and interoperability work
Leader: brian
10:15am-10:45am PT

* lot of work to do
* brian is working on it
* it's progressing, not sure if we can get everything done by 3.0
* how to deal with submodules
	 * you can produce a split history
	 * accept, document, ?
	 * we need to have mapping on server or client
	 * if someone pushed one commit in sha1 and a different in 256, we can end up
		 with divergent histories that could produce security issues
	 * some private repos for open-core type submodules make this difficult with
		 submodules
	 * could have the server query, client derive mapping
	 * server could also be malicious
* if you're converting, how does that work in gpg signatures?
	 * we have a way to map both signatures
	 * if you're in compatibility mode, it will produce signatures for both
	 * what about for older histories, how can it be verified if it's only valid
		 for sha1?
			* it can be verified but can't be resigned
	 * for converting, can that work?
			* converting will retain the sha1 signature
* what is the simplest user journey?
	 * I have a clone of a repo in sha1, am I expected to run a conversion locally
		 and then I can talk to GH in 256 protocol?
			* you will create a new repo with 256 with sha1 compatibility and clone
				into that, which will convert it into both algo
			* download the data again?
				 * clone it to another directory locally
				 * it will preserve the sha1 repo and create the compatibility layer
			* let's say the local one has a submodule, clone locally including the
				submodule?
				 * yes, the conversion script will convert the submodule as well and
					 you'll have both ids
			* if I do a fetch, which do I need
				 * you need a mapping if you're talking to a server with the other algo
			* the mapping is only needed for the server if it wants to be forward
				facing?
			* with mapping, its only commits or all objects
				 * all objects
			* if someone trusts github, they can just consume it's mapping?
				 * the server and client will do their own mapping
* what happens if nobody has the submodule anymore? commit from 10 years ago but
	nobody has that submodule anymore, how do you make a 256 tree out of that
	 * pick one at random it doesnt matter
			* but you can't match everyone else
	 * we've chosen to use divergent history in this case
	 * Same issue exists with LFS objects
* if you have the old submodules,
* recursive/cyclic submodules?
	 * it's something we need to handle, don't have a great plan but it could be
		 done
	 * plan is to maybe have some pool
			* you have to convert the submodule up until that point, then convert them
				piecewise
* have you thought about mix/match where one uses sha1 and the other uses 256
	 * we can't distinguish the size of the object id vs filename
* right now you're doing the work, are you thinking of allowing another hash
	algo without having these issues again?
	 * the way the design works now is that we have two algos - main and
		 compatibility, but designed to accept multiple algos. if we switch to 3512
		 at some point for example, we could add another compat algo - it's some
		 work but the approach doesn't assume much about the specific algorithm
* steiny thought it could be useful to add a third algo not for security but
	speed
	 * gh has the insecure non crypto varients
	 * problem is always client support
	 * corporate controlled repo often also has control of the clients - so maybe
		 less of a security issue but depends
* can you put a sha1 link inside a 256 tree
	 * maybe an extra bit in the mode, some other interesting horrible thoughts
	 * would it make submodule problems go away if you could just carry the other
		 forever until the downstream decides to switch
	 * solves the submodule problem but not LFS problem?
			* LFS might be easier, you don't need to have the object to convert yours
			* assuming you have the object still
	 * brian not 100% against it
			* if I could do a 256 repo with a 256 submodule, you could parse it back,
				but if you do that, it's a different size and not usable by older
				versions of git
	 * if we were clever, sha1 trees hold sh1, 256 holds 256 and only when you
		 have a sha1 tree inside a 256 that we would use some new format
			* the problem is you still end up with stuff that doesn't work with older
				versions
			* degrades gracefully like a mode bit, worse case is that it checks out
				weird filenames?
			* write it out, take it to the list
	 * we discussed upgrading the tree object format, but it's so tight
`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help