Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode

From: Jeff King <hidden>
Date: 2024-11-07 01:39:22

Possibly related (same subject, not in this thread)

2024-11-07 · Re: [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode · Junio C Hamano <hidden>
2024-11-05 · [PATCH v2 1/2] t/helper/test-sha1: prepare for an unsafe mode · Taylor Blau <hidden>

On Thu, Nov 07, 2024 at 12:48:26AM +0000, brian m. carlson wrote:

quoted

I would have expected that there are different algorighm instances,
and one of them would be "unsafe SHA-1", among "normal SHA-1" and
"SHA-256" (as the last one would not even have unsafe_init_fn
method), and the callers that want to measure the performance of
each algorithm would simply pick one of these instances and go
through the usual "init", "update", "final" flow, regardless of the
"unsafe" ness of the algorithm.

[...]

quoted

... I didn't expect any of these "if (unsafe) .. else .." switches.

I don't think we can remove those, and here's why.  Certainly Taylor can
correct me if I'm off base, though.

In the normal case, our hash struct is a union, and different
implementations can have a different layout.  A SHA-1 implementation
will usually keep track of a 64-bit size, 5 32-bit words, and up to 64
bytes of data that might need to be processed.  Maybe SHA-1-DC, our safe
implementation, stores the size first, but our unsafe implementation
is OpenSSL and it stores the 5 hash words first, or whatever.

If we use the same update and final functions, we'll end up with
incorrect data because we'll have initialized the union contents with
data for one implementation but are trying to update or finalize it with
different data, which in the very best case will just produce broken
results, and might just cause a segfault (if one of the implementations
stores a pointer instead of storing the data in the union).

I don't think Junio is proposing mixing the functions on a single data
type. Which, as you note, would be a disaster. I think the idea is for a
separate git_hash_algo struct entirely, that can be slotted in as a
pointer (since git_hash_algo is already essentially a virtual type).

That gets weird as you say here:

We could certainly adopt a different hash algorithm structure for safe
and unsafe code, but we have a lot of places that assume that there's
just one structure per algorithm.  It would require a substantial amount
of refactoring to remove those assumptions and deal with the fact that
we now have two SHA-1 implementations.  It would also be tricky to deal
with the fact that for SHA-1, we might use the safe or unsafe algorithm,
but for SHA-256, there's only one algorithm structure and we need to use
it for both.

both because we have two different algos for "sha1", but also because
they are not _really_ independent. If the_hash_algo is sha1, then
whichever implementation a given piece of code is using, it must still
be one of the sha1 variants.

So I think you wouldn't want to allocate an enum or a slot in the
hash_algos array, because it is not really an independent algorithm.
But I think it _could_ work as a separate struct that the caller derives
from the main hash algorithm. For example, imagine that the
git_hash_algo struct didn't have unsafe_init_fn, etc, but instead had:

  struct git_hash_algo *unsafe_implementation;

with a matching accessor like:

  struct git_hash_algo *unsafe_hash_algo(struct git_hash_algo *algo)
  {
	/* if we have a faster "unsafe" implementation, use that */
	if (algo->unsafe_implementation)
		return algo->unsafe_implementation;
	/* otherwise just use the default one */
	return algo;
  }

And then csum-file.c, rather than calling:

  the_hash_algo->unsafe_init_fn(...);
  ...
  the_hash_algo->unsafe_final_fn(...);

would do this:

  struct git_hash_algo *algo = unsafe_hash_algo(the_hash_algo);
  algo->init_fn(...);
  ...
  algo->final_fn(...);

And likewise this test helper would have a single conditional at the
start:

  if (unsafe)
	algo = unsafe_hash_algo(algo);

and the rest of the code would just use that.

All that said, I do not think it buys us anything for "real" code. There
the decision between safe/unsafe is in the context of how we are using
it, and not based on some conditional. So while the code in this test
helper is ugly, I don't think we'd ever write anything like that for
real. So it may not be worth the effort to refactor into a more virtual
object-oriented way.

-Peff

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help