Thread (5 messages) 5 messages, 4 authors, 2025-11-03

Re: [PATCH 03/14] hash: use uint32_t for object_id algorithm

From: Ezekiel Newren <hidden>
Date: 2025-10-28 19:58:34

On Tue, Oct 28, 2025 at 1:33 PM Junio C Hamano [off-list ref] wrote:
Patrick Steinhardt [off-list ref] writes:
quoted
On Mon, Oct 27, 2025 at 12:43:53AM +0000, brian m. carlson wrote:
quoted
We currently use an int for this value, but we'll define this structure
from Rust in a future commit and we want to ensure that our data types
are exactly identical.  To make that possible, use a uint32_t for the
hash algorithm.
An alternative would be to introduce an enum and set up bindgen so that
we can pull this enum into Rust. I'd personally favor that over using an
uint32_t as it conveys way more meaning. Have you considered this?
Yeah, I do not very much appreciate change from "int" to "uint32_t"
randomly done only for things that happen to be used by both C and
Rust.  "When should I use 'int' or 'unsigned' and when should I use
'uint32_t'?" becomes extremely hard to answer.
I think the most appropriate time to change from C's ambiguous types
to unambiguous types is when it's going to be used for Rust FFI.
uint32_t should be used everywhere and casting to int or unsigned
should be done where that code hasn't been converted yet. This commit
isn't random, it's a deliberate effort to address code debt.
I suspect that it would be much more palatable if these functions
and struct members are to use a distinct type that is used only by
hash algorithm number (your "enum" is fine), that is typedef'ed to
be the 32-bit unsigned integer, e.g,

    +typedef uint32_t hash_algo_type;
    -int hash_algo_by_name(const char *name)
    +hash_algo_type hash_algo_by_name(const char *name)

Yeah, I know that C does not give us type safety against mixing two
different things, both of which are typedef'ed to the same uint32_t,
but doing something like the above would still add documentation
value.
I'm against passing Rust enum types over the FFI boundary since Rust
is free to add extra bytes to distinguish between types (and it's
documented by Rust as not being ABI stable). Even if something like
#[repr(C)] is used the problem is that the enum on the Rust side will
have an implicit field where that implicit field will need to be made
explicit on the C side, and if C sets an invalid value for that
implicit field then that will result in Rust UB. Converting Rust enum
types to C is non-trival and has many gotchas.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help