Thread (55 messages) 55 messages, 4 authors, 2025-01-23

Re: [PATCH v3 0/8] hash: introduce unsafe_hash_algo(), drop unsafe_ variants

From: Jeff King <hidden>
Date: 2025-01-18 12:28:32

On Fri, Jan 17, 2025 at 05:03:07PM -0500, Taylor Blau wrote:
    +    Note that hash_algo_by_ptr() needs an adjustment to allow passing in the
    +    unsafe variant of a hash function. All other query functions on the
    +    hash_algos array will continue to return the safe variants of any
    +    function.
    +
         Suggested-by: Jeff King [off-list ref]
         Signed-off-by: Taylor Blau [off-list ref]
     
    @@ hash.h: struct git_hash_algo {
      };
      extern const struct git_hash_algo hash_algos[GIT_HASH_NALGOS];
      
    -@@ hash.h: static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
    - 	return p - hash_algos;
    +@@ hash.h: int hash_algo_by_length(int len);
    + /* Identical, except for a pointer to struct git_hash_algo. */
    + static inline int hash_algo_by_ptr(const struct git_hash_algo *p)
    + {
    +-	return p - hash_algos;
    ++	size_t i;
    ++	for (i = 0; i < GIT_HASH_NALGOS; i++) {
    ++		const struct git_hash_algo *algop = &hash_algos[i];
    ++		if (p == algop || (algop->unsafe && p == algop->unsafe))
    ++			return i;
    ++	}
    ++	return GIT_HASH_UNKNOWN;
      }
OK, so this version introduces the loop we discussed earlier. I think we
can probably dismiss any performance loss as theoretical unless somebody
can think of a good way to measure. It seems like worrying about it is
probably a premature micro-optimization.

It is a little quirky that it loses the transitive nature of
hash_algo_by_ptr() and hash_algo_by_id(). So this is unsafe:

  /* start with unsafe variant */
  algo = unsafe_hash_algo(the_hash_algo);
  algo->init_fn(...);

  /* returns GIT_HASH_SHA1, even for the unsafe variant */
  id = hash_algo_by_ptr(algo);

  /* returns the safe variant */
  algo = hash_algo_by_id(id);

  /* oops, this is mix-and-matching */
  alog->update_fn(...);

Now obviously that sort of round-trip is a silly thing to do in a single
function. Could it happen across a larger call-chain, where the id is
stored in a struct and passed somewhere far away? I guess so, but it
also seems kind of unlikely.

It does make me wonder if hash_algo_by_ptr() should just return UNKNOWN
for the unsafe variant. Then we'd at least get a loud error from the
caller (as opposed to the code before your patch, which is undefined
behavior). I dunno.

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help