Thread (5 messages) 5 messages, 4 authors, 2025-11-03

Re: [PATCH 03/14] hash: use uint32_t for object_id algorithm

From: Collin Funk <hidden>
Date: 2025-10-30 01:58:54

Hi Brian,

"brian m. carlson" [off-list ref] writes:
On 2025-10-28 at 19:33:32, Junio C Hamano wrote:
quoted
Yeah, I do not very much appreciate change from "int" to "uint32_t"
randomly done only for things that happen to be used by both C and
Rust.  "When should I use 'int' or 'unsigned' and when should I use
'uint32_t'?" becomes extremely hard to answer.
In general, the answer is that we should use `int` or `unsigned` when
you're defining a loop index or other non-structure types that are only
used from C.  Otherwise, we should use one of the stdint.h or stddef.h
types ((u)int*_t, (s)size_t, etc.), since these have defined,
well-understood sizes.  Also, in general, we want to use unsigned types
for things that cannot have valid negative values (such as the hash
algorithm constants that are also array indices), especially since Rust
tends not to use sentinel values (preferring `Option` instead).
I don't necessarily disagree with your point, just want to reiterate a
point a touched on in another thread [1]. In some cases it is valuable
to use signed integers even if a valid value will never be negative.
This is because signed integer overflow can be easily caught with
-fsanitize=undefined. An unsigned integer wrapping around is perfectly
defined, but may lead to strange bugs in your program.
Part of our problem is that being lazy and making lots of assumptions in
our codebase has led to some suboptimal consequences.  Our diff code
can't handle files bigger than about 1 GiB because we use `int` and
Windows has all sorts of size limitations because we assumed that
sizeof(long) == sizeof(size_t) == sizeof(void *).  Nobody now would say,
"Gee, I think we'd like to have these arbitrary 32-bit size limits," and
using something with a fixed size helps us think, "How big should this
data type be?  Do I really want to limit this data structure to
processing only 32 bits worth of data?"

In this case, the use of a 32-bit value is fine because we already have
that for the existing type (via `int`) and it is extremely unlikely that
4 billion cryptographic hash algorithms will ever be created, let alone
implemented in Git, so the size is not a factor.
I guess intmax_t and uintmax_t are probably not usable with Rust, since
they are not fixed width?

Collin

[1] https://public-inbox.org/git/87jz16dux5.fsf@gmail.com/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help