Re: [PATCH 03/14] hash: use uint32_t for object_id algorithm
From: Ezekiel Newren <hidden>
Date: 2025-10-28 19:58:34
On Tue, Oct 28, 2025 at 1:33 PM Junio C Hamano [off-list ref] wrote:
Patrick Steinhardt [off-list ref] writes:quoted
On Mon, Oct 27, 2025 at 12:43:53AM +0000, brian m. carlson wrote:quoted
We currently use an int for this value, but we'll define this structure from Rust in a future commit and we want to ensure that our data types are exactly identical. To make that possible, use a uint32_t for the hash algorithm.An alternative would be to introduce an enum and set up bindgen so that we can pull this enum into Rust. I'd personally favor that over using an uint32_t as it conveys way more meaning. Have you considered this?Yeah, I do not very much appreciate change from "int" to "uint32_t" randomly done only for things that happen to be used by both C and Rust. "When should I use 'int' or 'unsigned' and when should I use 'uint32_t'?" becomes extremely hard to answer.
I think the most appropriate time to change from C's ambiguous types to unambiguous types is when it's going to be used for Rust FFI. uint32_t should be used everywhere and casting to int or unsigned should be done where that code hasn't been converted yet. This commit isn't random, it's a deliberate effort to address code debt.
I suspect that it would be much more palatable if these functions
and struct members are to use a distinct type that is used only by
hash algorithm number (your "enum" is fine), that is typedef'ed to
be the 32-bit unsigned integer, e.g,
+typedef uint32_t hash_algo_type;
-int hash_algo_by_name(const char *name)
+hash_algo_type hash_algo_by_name(const char *name)
Yeah, I know that C does not give us type safety against mixing two
different things, both of which are typedef'ed to the same uint32_t,
but doing something like the above would still add documentation
value.I'm against passing Rust enum types over the FFI boundary since Rust is free to add extra bytes to distinguish between types (and it's documented by Rust as not being ABI stable). Even if something like #[repr(C)] is used the problem is that the enum on the Rust side will have an implicit field where that implicit field will need to be made explicit on the C side, and if C sets an invalid value for that implicit field then that will result in Rust UB. Converting Rust enum types to C is non-trival and has many gotchas.