[NOTES 01/11] SHA-256 and interoperability work
From: Taylor Blau <hidden>
Date: 2025-10-06 19:18:05
Topic: SHA256 and interoperability work Leader: brian 10:15am-10:45am PT * lot of work to do * brian is working on it * it's progressing, not sure if we can get everything done by 3.0 * how to deal with submodules * you can produce a split history * accept, document, ? * we need to have mapping on server or client * if someone pushed one commit in sha1 and a different in 256, we can end up with divergent histories that could produce security issues * some private repos for open-core type submodules make this difficult with submodules * could have the server query, client derive mapping * server could also be malicious * if you're converting, how does that work in gpg signatures? * we have a way to map both signatures * if you're in compatibility mode, it will produce signatures for both * what about for older histories, how can it be verified if it's only valid for sha1? * it can be verified but can't be resigned * for converting, can that work? * converting will retain the sha1 signature * what is the simplest user journey? * I have a clone of a repo in sha1, am I expected to run a conversion locally and then I can talk to GH in 256 protocol? * you will create a new repo with 256 with sha1 compatibility and clone into that, which will convert it into both algo * download the data again? * clone it to another directory locally * it will preserve the sha1 repo and create the compatibility layer * let's say the local one has a submodule, clone locally including the submodule? * yes, the conversion script will convert the submodule as well and you'll have both ids * if I do a fetch, which do I need * you need a mapping if you're talking to a server with the other algo * the mapping is only needed for the server if it wants to be forward facing? * with mapping, its only commits or all objects * all objects * if someone trusts github, they can just consume it's mapping? * the server and client will do their own mapping * what happens if nobody has the submodule anymore? commit from 10 years ago but nobody has that submodule anymore, how do you make a 256 tree out of that * pick one at random it doesnt matter * but you can't match everyone else * we've chosen to use divergent history in this case * Same issue exists with LFS objects * if you have the old submodules, * recursive/cyclic submodules? * it's something we need to handle, don't have a great plan but it could be done * plan is to maybe have some pool * you have to convert the submodule up until that point, then convert them piecewise * have you thought about mix/match where one uses sha1 and the other uses 256 * we can't distinguish the size of the object id vs filename * right now you're doing the work, are you thinking of allowing another hash algo without having these issues again? * the way the design works now is that we have two algos - main and compatibility, but designed to accept multiple algos. if we switch to 3512 at some point for example, we could add another compat algo - it's some work but the approach doesn't assume much about the specific algorithm * steiny thought it could be useful to add a third algo not for security but speed * gh has the insecure non crypto varients * problem is always client support * corporate controlled repo often also has control of the clients - so maybe less of a security issue but depends * can you put a sha1 link inside a 256 tree * maybe an extra bit in the mode, some other interesting horrible thoughts * would it make submodule problems go away if you could just carry the other forever until the downstream decides to switch * solves the submodule problem but not LFS problem? * LFS might be easier, you don't need to have the object to convert yours * assuming you have the object still * brian not 100% against it * if I could do a 256 repo with a 256 submodule, you could parse it back, but if you do that, it's a different size and not usable by older versions of git * if we were clever, sha1 trees hold sh1, 256 holds 256 and only when you have a sha1 tree inside a 256 that we would use some new format * the problem is you still end up with stuff that doesn't work with older versions * degrades gracefully like a mode bit, worse case is that it checks out weird filenames? * write it out, take it to the list * we discussed upgrading the tree object format, but it's so tight