Re: [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants
From: Jeff King <hidden>
Date: 2025-01-18 12:28:32
On Fri, Jan 17, 2025 at 05:03:07PM -0500, Taylor Blau wrote:
+ Note that hash_algo_by_ptr() needs an adjustment to allow passing in the
+ unsafe variant of a hash function. All other query functions on the
+ hash_algos array will continue to return the safe variants of any
+ function.
+
Suggested-by: Jeff King [off-list ref]
Signed-off-by: Taylor Blau [off-list ref]
@@ hash.h: struct git_hash_algo {
};
extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
-@@ hash.h: static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
- return p - hash_algos;
+@@ hash.h: int hash_algo_by_length(int len);
+ /* Identical, except for a pointer to struct git_hash_algo. */
+ static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
+ {
+- return p - hash_algos;
++ size_t i;
++ for (i = 0; i < GIT_HASH_NALGOS; i++) {
++ const struct git_hash_algo *algop = &hash_algos[i];
++ if (p == algop || (algop->unsafe && p == algop->unsafe))
++ return i;
++ }
++ return GIT_HASH_UNKNOWN;
}OK, so this version introduces the loop we discussed earlier. I think we can probably dismiss any performance loss as theoretical unless somebody can think of a good way to measure. It seems like worrying about it is probably a premature micro-optimization. It is a little quirky that it loses the transitive nature of hash_algo_by_ptr() and hash_algo_by_id(). So this is unsafe: /* start with unsafe variant */ algo = unsafe_hash_algo(the_hash_algo); algo->init_fn(...); /* returns GIT_HASH_SHA1, even for the unsafe variant */ id = hash_algo_by_ptr(algo); /* returns the safe variant */ algo = hash_algo_by_id(id); /* oops, this is mix-and-matching */ alog->update_fn(...); Now obviously that sort of round-trip is a silly thing to do in a single function. Could it happen across a larger call-chain, where the id is stored in a struct and passed somewhere far away? I guess so, but it also seems kind of unlikely. It does make me wonder if hash_algo_by_ptr() should just return UNKNOWN for the unsafe variant. Then we'd at least get a loud error from the caller (as opposed to the code before your patch, which is undefined behavior). I dunno. -Peff