Re: [PATCH] read-cache: avoid misaligned reads in index v4
From: Victoria Dye <hidden>
Date: 2022-09-26 16:47:53
Jeff King wrote:
On Fri, Sep 23, 2022 at 07:43:55PM +0000, Victoria Dye via GitGitGadget wrote:quoted
@@ -1883,7 +1883,7 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, size_t len; const char *name; const unsigned hashsz = the_hash_algo->rawsz; - const uint16_t *flagsp = (const uint16_t *)(ondisk->data + hashsz); + const char *flagsp = ondisk + offsetof(struct ondisk_cache_entry, data) + hashsz;Now we use the "const char *" pointer instead of the cast to the ondisk_cache_entry struct, which is good, and is what fixes the alignment question. But we also convert flagsp from being a uint16_t into a byte pointer. I'm not sure if that's strictly necessary from an alignment perspective, as we'd dereference it only via get_be16(), which handles alignment and type conversion itself. I'd imagine the standard probably says that even forming such a pointer is illegal, so in that sense, it probably is undefined behavior. But I think it's one of those things that's OK in practice.
Yep, per the C standard §6.3.2.3 #7 [1]: A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. To your point, it is probably fine in practice, but I'd lean towards sticking with a 'char *' to play it safe. [1] https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
quoted
@@ -1935,20 +1935,24 @@ static struct cache_entry *create_from_disk(struct mem_pool *ce_mem_pool, ce = mem_pool__ce_alloc(ce_mem_pool, len); - ce->ce_stat_data.sd_ctime.sec = get_be32(&ondisk->ctime.sec);[...] + ce->ce_stat_data.sd_ctime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, ctime) + + offsetof(struct cache_time, sec));I had figured we'd be able to drop ondisk_cache_entry entirely. But here you're using it essentially as a template for a set of constants retrieved via offsetof(). That's OK from an alignment perspective. It does mean we'd be in trouble if a compiler ever decided to introduce padding into the struct. That's probably unlikely. We don't use __attribute__((packed)) because it's not portable, and our existing uses have generally been OK, because our data structures are organized around 8-byte alignment. We might have problems on a theoretical 128-bit processor or something.
In addition to portability, using '__attribute__((packed))' could hurt performance (and, in a large index, that might have a noticeable effect). As for dropping 'ondisk_cache_entry()', I didn't want to drop it only from the "read" operation (and use something like the "parse left-to-right" strategy below) while leaving it in "write." And, as you mentioned later, changing 'ce_write_entry()' is a lot more invasive than what's already in this patch and possibly out-of-scope.
Another strategy is to just parse left-to-right, advancing the byte pointer. Like: ce->ce_state_data.sd_ctime.sec = get_be32(ondisk); ondisk += sizeof(uint32_t); ce->ce_state_data.sd_mtime.sec = get_be32(ondisk); ondisk += sizeof(uint32_t); ...etc... You can even stick that in a helper function that does the get_b32() and advances, so you know they're always done in sync. See pack-bitmap.c's read_be32(), etc. IMHO this produces a nice result because the reading code itself becomes the source of truth for the format.
...
One final note, though:quoted
+ ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime) + + offsetof(struct cache_time, sec));Here (and elsewhere), you can assume that the offsetof() "sec" in cache_time is 0, for two reasons: - I didn't look up chapter and verse, but I'm pretty sure the standard does guarantee that the first field of a struct is at the beginning. - If there's any padding, this whole scheme is hosed anyway, because it means sizeof(cache_time) is bigger than we expect, which messes up the offsetof() the entry after us (in this case sd_dev). So this can just be: ce->ce_stat_data.sd_mtime.sec = get_be32(ondisk + offsetof(struct ondisk_cache_entry, mtime)); which is mercifully shorter. Assuming we dismiss the rest of what I said as not worth it for a minimal fix, I do think that simplification is worth rolling a v2.
That makes sense from a technical perspective, but I included the starting entry offset for readability reasons. It might be confusing to someone unfamiliar with C struct memory alignment to see every other 'get_be32' refer to the exact entry it's reading via the 'offsetof()', but have that information absent only for a few entries. And, the double 'offsetof()' would still be used by the 'mtime.nsec'/'ctime.nsec' fields anyway. In any case, if this patch is intended to be a short-lived change on the way to a more complete refactor and/or I'm being overzealous on the readability, I'd be happy to change it. :) Thanks!
-Peff PS BTW, I mentioned earlier "can we just get rid of ondisk_cache_entry". We also use it for the writing side, of course. That doesn't have alignment issues, but it does have the same "I hope there's never any padding" question. In an ideal world, it would be using the equivalent put_be32(), but again, that's getting out of the "minimal fix" territory.