Thread (18 messages) 18 messages, 2 authors, 2025-01-09

Re: [PATCH 02/10] builtin/fast-import: fix segfault with unsafe SHA1

From: Patrick Steinhardt <hidden>
Date: 2025-01-07 12:06:25

On Mon, Jan 06, 2025 at 02:17:58PM -0500, Taylor Blau wrote:
On Fri, Jan 03, 2025 at 02:08:01PM +0100, Patrick Steinhardt wrote:
quoted
On Mon, Dec 30, 2024 at 12:22:34PM -0500, Taylor Blau wrote:
quoted
On Mon, Dec 30, 2024 at 03:24:02PM +0100, Patrick Steinhardt wrote:
quoted
diff --git a/builtin/fast-import.c b/builtin/fast-import.c
index 1fa2929a01b7dfee52b653248bba802884f6be6a..0f86392761abbe6acb217fef7f4fe7c3ff5ac1fa 100644
--- a/builtin/fast-import.c
+++ b/builtin/fast-import.c
@@ -1106,7 +1106,7 @@ static void stream_blob(uintmax_t len, struct object_id *oidout, uintmax_t mark)
 		|| (pack_size + PACK_SIZE_THRESHOLD + len) < pack_size)
 		cycle_packfile();

-	the_hash_algo->init_fn(&checkpoint.ctx);
+	the_hash_algo->unsafe_init_fn(&checkpoint.ctx);
This will obviously fix the issue at hand, but I don't think this is any
less brittle than before. The hash function implementation here needs to
agree with that used in the hashfile API. This change makes that
happen, but only using side information that the hashfile API uses the
unsafe variants.
Yup, I only cared about fixing the segfault because we're close to the
v2.48 release. I agree that the overall state is still extremely brittle
right now.

[snip]
quoted
I think we should perhaps combine forces here. My ideal end-state is to
have the unsafe_hash_algo() stuff land from my earlier series, then have
these two fixes (adjusted to the new world order as above), and finally
the Meson fixes after that.

Does that seem like a plan to you? If so, I can put everything together
and send it out (if you're OK with me forging your s-o-b).
I think the ideal state would be if the hashing function used was stored
as part of `struct git_hash_ctx`. So the flow basically becomes for
example:
    struct git_hash_ctx ctx;
    struct object_id oid;

    git_hash_sha1_init(&ctx);
    git_hash_update(&ctx, data);
    git_hash_final_oid(&oid, &ctx);
Note how the intermediate calls don't need to know which hash function
you used to initialize the `struct git_hash_ctx` -- the structure itself
should remember what it has been initilized with and do the right thing.
I'm not sure I'm following you here. In the stream_blob() function
within fast-import, the problem isn't that we're switching hash
functions mid-stream, but that we're initializing the hashfile_context
structure with the wrong hash function to begin with.
True, but it would have been a non-issue if the hash context itself knew
which hash function to use for updates. Sure, we would've used the slow
variant of SHA1 instead of the fast-but-unsafe one. But that feels like
the lesser evil compared to crashing.
You snipped it out of your reply, but I think that my suggestion to do:

    pack_file->algop->init_fn(&checkpoint.ctx);

would harden us against the broken behavior we're seeing here.

As a separate defense-in-depth measure, we could teach functions from
the hashfile API which deal with hashfile_checkpoint structure to ensure
that the hashfile and its checkpoint both use the same algorithm (by
adding a hash_algo field to the hashfile_checkpoint structure).
I would think that it were even harder to abuse if it wasn't the
hashfile API, but the hash API that remembered the algorithm.

Patrick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help