Re: [PATCH 03/14] hash: use uint32_t for object_id algorithm
From: Collin Funk <hidden>
Date: 2025-10-30 01:58:54
Hi Brian, "brian m. carlson" [off-list ref] writes:
On 2025-10-28 at 19:33:32, Junio C Hamano wrote:quoted
Yeah, I do not very much appreciate change from "int" to "uint32_t" randomly done only for things that happen to be used by both C and Rust. "When should I use 'int' or 'unsigned' and when should I use 'uint32_t'?" becomes extremely hard to answer.In general, the answer is that we should use `int` or `unsigned` when you're defining a loop index or other non-structure types that are only used from C. Otherwise, we should use one of the stdint.h or stddef.h types ((u)int*_t, (s)size_t, etc.), since these have defined, well-understood sizes. Also, in general, we want to use unsigned types for things that cannot have valid negative values (such as the hash algorithm constants that are also array indices), especially since Rust tends not to use sentinel values (preferring `Option` instead).
I don't necessarily disagree with your point, just want to reiterate a point a touched on in another thread [1]. In some cases it is valuable to use signed integers even if a valid value will never be negative. This is because signed integer overflow can be easily caught with -fsanitize=undefined. An unsigned integer wrapping around is perfectly defined, but may lead to strange bugs in your program.
Part of our problem is that being lazy and making lots of assumptions in our codebase has led to some suboptimal consequences. Our diff code can't handle files bigger than about 1 GiB because we use `int` and Windows has all sorts of size limitations because we assumed that sizeof(long) == sizeof(size_t) == sizeof(void *). Nobody now would say, "Gee, I think we'd like to have these arbitrary 32-bit size limits," and using something with a fixed size helps us think, "How big should this data type be? Do I really want to limit this data structure to processing only 32 bits worth of data?" In this case, the use of a 32-bit value is fine because we already have that for the existing type (via `int`) and it is extremely unlikely that 4 billion cryptographic hash algorithms will ever be created, let alone implemented in Git, so the size is not a factor.
I guess intmax_t and uintmax_t are probably not usable with Rust, since they are not fixed width? Collin [1] https://public-inbox.org/git/87jz16dux5.fsf@gmail.com/