Re: [PATCH v2 2/2] Prevent git from rehashing 4GiB files
From: Jason Hatton <hidden>
Date: 2023-10-17 14:50:49
From: Jeff King <redacted>
On Thu, Oct 12, 2023 at 04:09:30PM +0000, brian m. carlson wrote:quoted
+static unsigned int munge_st_size(off_t st_size) { + unsigned int sd_size = st_size; + + /* + * If the file is an exact multiple of 4 GiB, modify the value so it + * doesn't get marked as racily clean (zero). + */ + if (!sd_size && st_size) + return 0x80000000; + else + return sd_size; +}Coverity complained that the "true" side of this conditional is unreachable, since sd_size is assigned from st_size, so the two values cannot be both true and false. But obviously we are depending here on the truncation of off_t to "unsigned int". I'm not sure if Coverity is just dumb, or if it somehow has a different size for off_t. I don't _think_ this would ever cause confusion in a real compiler, as assignment from a larger type to a smaller has well-defined truncation, as far as I know. But I do wonder if an explicit "& 0xFFFFFFFF" would make it more obvious what is happening (which would also do the right thing if in some hypothetical platform "unsigned int" ended up larger than 32 bits). -Peff
I originally wrote the code this way to work exactly like the original
code with one exception: Never truncate a nonzero st_size to a zero
sd_size. The original code is here in fill_stat_data:
I was attempting to use exactly the same implicit type conversion and
types as the original.
We could probably write the below to do the same thing.
void fill_stat_data(struct stat_data *sd, struct stat *st)
{
sd->sd_ctime.sec = (unsigned int)st->st_ctime;
sd->sd_mtime.sec = (unsigned int)st->st_mtime;
sd->sd_ctime.nsec = ST_CTIME_NSEC(*st);
sd->sd_mtime.nsec = ST_MTIME_NSEC(*st);
sd->sd_dev = st->st_dev;
sd->sd_ino = st->st_ino;
sd->sd_uid = st->st_uid;
sd->sd_gid = st->st_gid;
sd->sd_size = st->st_size;
+ if (sd->sd_size == 0 && st->st_size!= 0) {
+ sd->sd_size = 1;
+ }
}
- Jason D. Hatton