Re: [PATCH 03/15] cache: add an algo member to struct object_id

[PATCH 00/15] SHA-256 / SHA-1 interop, part 1 · brian m. carlson <hidden> · 2021-04-10
[PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm · Denton Liu <hidden> · 2021-04-15
Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm · brian m. carlson <hidden> · 2021-04-15
Re: [PATCH 01/15] sha1-file: allow hashing objects literally with any algorithm · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-16
[PATCH 04/15] Always use oidread to read into struct object_id · brian m. carlson <hidden> · 2021-04-10
[PATCH 05/15] hash: add a function to finalize object IDs · brian m. carlson <hidden> · 2021-04-10
[PATCH 07/15] builtin/pack-redundant: avoid casting buffers to struct object_id · brian m. carlson <hidden> · 2021-04-10
[PATCH 08/15] cache: compare the entire buffer for struct object_id · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id · Chris Torek <hidden> · 2021-04-11
Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-11
Re: [PATCH 08/15] cache: compare the entire buffer for struct object_id · brian m. carlson <hidden> · 2021-04-11
[PATCH 10/15] hash: provide per-algorithm null OIDs · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 10/15] hash: provide per-algorithm null OIDs · Junio C Hamano <hidden> · 2021-04-11
Re: [PATCH 10/15] hash: provide per-algorithm null OIDs · brian m. carlson <hidden> · 2021-04-11
[PATCH 09/15] hash: set and copy algo field in struct object_id · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-11
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · brian m. carlson <hidden> · 2021-04-11
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-11
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · brian m. carlson <hidden> · 2021-04-11
[PATCH 0/2] C99: harder dependency on variadic macros · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-12
[PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-12
Re: [PATCH 2/2] C99 support: remove non-HAVE_VARIADIC_MACROS code · Jonathan Nieder <hidden> · 2021-05-21
[PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-12
Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code · Jeff King <hidden> · 2021-04-13
Re: [PATCH 1/2] git-compat-util.h: clarify comment on GCC-specific code · Jonathan Nieder <hidden> · 2021-05-21
Re: [PATCH 0/2] C99: harder dependency on variadic macros · Bagas Sanjaya <hidden> · 2021-04-12
Re: [PATCH 0/2] C99: harder dependency on variadic macros · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-12
Re: [PATCH 0/2] C99: harder dependency on variadic macros · brian m. carlson <hidden> · 2021-04-12
[PATCH v2 0/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-01-28
[PATCH v2 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment · Ævar Arnfjörð Bjarmason <hidden> · 2022-01-28
[PATCH v2 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-01-28
[PATCH v3 0/3] C99: remove dead !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-19
[PATCH v3 1/3] git-compat-util.h: clarify GCC v.s. C99-specific in comment · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-19
[PATCH v3 3/3] trace.h: remove never-used TRACE_CONTEXT · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-19
Re: [PATCH v3 3/3] trace.h: remove never-used TRACE_CONTEXT · Junio C Hamano <hidden> · 2022-02-20
Re: [PATCH v3 3/3] trace.h: remove never-used TRACE_CONTEXT · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-20
[PATCH v3 2/3] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-19
[PATCH v4 0/2] C99: remove dead !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-21
[PATCH v4 1/2] git-compat-util.h: clarify GCC v.s. C99-specific in comment · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-21
[PATCH v4 2/2] C99: remove hardcoded-out !HAVE_VARIADIC_MACROS code · Ævar Arnfjörð Bjarmason <hidden> · 2022-02-21
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · Junio C Hamano <hidden> · 2021-04-12
Re: [PATCH 09/15] hash: set and copy algo field in struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-12
[PATCH 11/15] builtin/show-index: set the algorithm for object IDs · brian m. carlson <hidden> · 2021-04-10
[PATCH 12/15] commit-graph: don't store file hashes as struct object_id · brian m. carlson <hidden> · 2021-04-10
[PATCH 15/15] hex: print objects using the hash algorithm member · brian m. carlson <hidden> · 2021-04-10
[PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-11
Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm · brian m. carlson <hidden> · 2021-04-11
Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-16
Re: [PATCH 02/15] builtin/hash-object: allow literally hashing with a given algorithm · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-16
[PATCH 03/15] cache: add an algo member to struct object_id · brian m. carlson <hidden> · 2021-04-10
Re: [PATCH 03/15] cache: add an algo member to struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-11
Re: [PATCH 03/15] cache: add an algo member to struct object_id · brian m. carlson <hidden> · 2021-04-11
Re: [PATCH 03/15] cache: add an algo member to struct object_id · Derrick Stolee <hidden> · 2021-04-13
Re: [PATCH 03/15] cache: add an algo member to struct object_id · brian m. carlson <hidden> · 2021-04-14
Re: [PATCH 03/15] cache: add an algo member to struct object_id · Ævar Arnfjörð Bjarmason <hidden> · 2021-04-15
Re: [PATCH 03/15] cache: add an algo member to struct object_id · brian m. carlson <hidden> · 2021-04-15
[PATCH 06/15] Use the final_oid_fn to finalize hashing of object IDs · brian m. carlson <hidden> · 2021-04-10
[PATCH 13/15] builtin/pack-objects: avoid using struct object_id for pack hash · brian m. carlson <hidden> · 2021-04-10
[PATCH 14/15] hex: default to the_hash_algo on zero algorithm value · brian m. carlson <hidden> · 2021-04-10

From: brian m. carlson <hidden>
Date: 2021-04-11 21:38:00

On 2021-04-11 at 11:55:57, Ævar Arnfjörð Bjarmason wrote:

On Sat, Apr 10 2021, brian m. carlson wrote:

quoted

Now that we're working with multiple hash algorithms in the same repo,
it's best if we label each object ID with its algorithm so we can
determine how to format a given object ID. Add a member called algo to
struct object_id.

Signed-off-by: brian m. carlson <redacted>
---
 hash.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hash.h b/hash.h
index 3fb0c3d400..dafdcb3335 100644
--- a/hash.h
+++ b/hash.h

@@ -181,6 +181,7 @@ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
 
 struct object_id {
 	unsigned char hash[GIT_MAX_RAWSZ];
+	int algo;

Curiosity since I'm not as familiar as you with the multi-hash support
by far:

So struct object_id is GIT_MAX_RAWSZ, not two types of structs for
GIT_SHA1_RAWSZ and GIT_SHA256_RAWSZ. That pre-dates this series because
we'd like to not deal with two types of objects everywhere for SHA-1 and
SHA-256. Makes sense.

Before this series we'd memcmp them up to their actual length, but the
last GIT_MAX_RAWSZ-GIT_SHA1_RAWSZ would be uninitialized

Now we pad them out, so the last 96 bits of every SHA1 are 0000...;
Couldn't we also tell which hash an object is by memcmp-ing those last N
bits and see if they're all zero'd?

That makes a lot of assumptions about the security of the hash
algorithm that I don't want to make here.  If anyone can ever find a
SHA-256 hash with trailing 96 bits zero, then they can confuse it with a
SHA-1 hash.  That means that our security level goes from 128 bits to 96
bits.  It's also a nonstandard construction.

More importantly, it results in the null OID being treated as a SHA-1
OID.  Because we do print the null OID in some cases, we're going to
break a lot of output formats if we print all the rest of the OIDs with
64 characters and then the null OID with 40.  That's to say nothing of
the problems in binary formats.

The reason we pad these objects is because our hashmaps are broken if we
don't.  I don't remember all the gory details, but it was obvious to me
that if they weren't consistently equal, things were going to be broken.
That's the only reason, not theoretical purity.

Feels a bit hackish, and we'd need to reconsider that method if we'd
ever support other same-length hashes.

My hope is that we don't need to do this, but we do have SHA-3 to serve
as a backup for SHA-2.  If quantum computers don't progress
substantially, SHA-3-256 is definitely a viable candidate for
replacement if anything ever happens to SHA-256.

But OTOH having these objects all padded out in memory to the same
length, but having to carry around a "what hash algo" is it yields the
arguably weird hack of having a per-hash NULL_OID, which has never been
an actual object of any hash type, but just a pseudo-object.

Unfortunately, as I mentioned above, we need to have two null OIDs to
handle printing things out.  It's inconvenient, I agree.

I abandoned it as insany sillyness after playing with it for about a
day, but it did reveal that much of the hash code now can assume
internal length == formatting length, which is why I'm 3 paragraphs into
this digression, i.e. maybe some of the code structure also makes having
a NULL_OID always be 256-bits when we want to format it as 160/256
painful...

We'll always format based on the algorithm in the OID.  That's the
simplest way to make things work because unfortunately we may end up
with both types of OIDs in the same code paths (as we're converting one
to the other) and otherwise our printing functions need a lot of special
handling and even more variants than they have today.
-- 
brian m. carlson (he/him or they/them)
Houston, Texas, US

Attachments

signature.asc [application/pgp-signature] 263 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help