Re: t9210-scalar.sh fails with SANITIZE=undefined
From: Jeff King <hidden>
Date: 2022-09-22 22:27:24
On Thu, Sep 22, 2022 at 03:09:52PM -0700, Victoria Dye wrote:
Other than allowing us to use a (non-packed) 'struct ondisk_cache_entry' to parse the index entries, is there any reason why the on-disk index entries should (or need to be) 4-byte aligned? If not, we could update how we read the 'ondisk' index entry in 'create_from_disk()' to avoid the misalignment.
I don't think so. And indeed, we already use get_be16(), etc, for most of the access (which is mostly there for endian-fixing, but also resolves alignment problems).
quoted hunk ↗ jump to hunk
------------------8<------------------8<------------------8<------------------diff --git a/read-cache.c b/read-cache.c index b09128b188..f132a3f256 100644 --- a/read-cache.c +++ b/read-cache.c@@ -1875,7 +1875,7 @@ static int read_index_extension(struct index_state *istate, static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, unsigned int version, - struct ondisk_cache_entry *ondisk, + const char *ondisk, unsigned long *ent_size, const struct cache_entry *previous_ce) {@@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, size_t len; const char *name; const unsigned hashsz = the_hash_algo->rawsz; - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz); + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz; unsigned int flags; size_t copy_len = 0; /* ------------------>8------------------>8------------------>8------------------the do the same sort of conversion with the rest of the function. It's certainly uglier than just using the 'struct ondisk_index_entry *' pointer, but it should avoid the misaligned addressing error.
Yeah, I think that's probably the only reasonable solution. I thought ditching ondisk_cache_entry entirely (which is basically what this is doing) would be a tough sell, but a quick "grep" shows it really isn't used in all that many spots. I also wondered why other versions do not have a similar problem. After all, cache entries contain pathnames which are going to be of varying lengths. But this seems telling: $ git grep -m1 -B1 -A2 align_padding_size read-cache.c-/* These are only used for v3 or lower */ read-cache.c:#define align_padding_size(size, len) ((size + (len) + 8) & ~7) - (size + len) read-cache.c-#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,data) + (len) + 8) & ~7) read-cache.c-#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len) So we actually pad the entries in earlier versions to align them, but don't in v4. I'm not sure if that was a conscious choice to save space, or an unintended consequence (though it is mentioned in the docs, I think that came after the code). That's probably all obvious to people who work with the index a lot. It's the one part of Git I've mostly managed to remain oblivious to. :) -Peff