Re: [PATCH 4/6] pack-bitmap.c: factor out manual `map_pos` manipulation
From: Taylor Blau <hidden>
Date: 2023-03-24 23:09:02
On Tue, Mar 21, 2023 at 01:56:12PM -0400, Jeff King wrote:
I like the idea of centralizing the bounds checks here.
I do think copying lseek() is a little awkward, though. As you note, we
only care about SEEK_SET and SEEK_CUR, and we have to BUG() on anything
else. Which means we have a run-time check, rather than a compile time
check. If we had two functions like:
void bitmap_index_seek_to(struct bitmap_index *bitmap_git, size_t pos)
{
bitmap_git->map_pos = pos;
if (bitmap_git->map_pos >= bitmap_git->map_size)
...etc...
}
void bitmap_index_incr(struct bitmap_index *bitmap_git, size_t offset)
{
bitmap_index_seek_to(bitmap_git, st_add(bitmap_git->map_pos, offset));
}
then the compiler can see all cases directly. I dunno how much it
matters.Yeah, I think that trying to copy the interface of lseek() was a mistake, since it ended up creating more awkwardness than anything else. I like the combination of bitmap_index_seek_to() and _incr(), though though I called the latter bitmap_index_seek_set() instead. We've got to be a little reminiscent of lseek() after all ;-).
The other thing that's interesting from a bounds-checking perspective is that you're checking the seek _after_ you've read an item. Which has two implications: - I don't think we could ever overflow size_t here.
Yep, agreed. Let's drop the st_add() call there entirely, since it's not doing anything useful and just serving to confuse the reader.
- The more interesting case is that we are not overflowing size_t, but
simply walking past bitmap_git->map_size. But again, we are reading
first and then seeking. So if our seek does go past, then by
definition we just read garbage bytes, which is undefined behavior.
For a bounds-check, wouldn't we want it the other way around? To
make sure we have 4 bytes available, and then if so read them and
increment the offset?I thought on first blush that this would be a much larger change, but actually it's quite small. The EWAH code already checks its reads ahead of time, so we don't have to worry about those. It's really that read_be32() and read_u8() do the read-then-seek. So I think that the change here is limited to making sure that we can read enough bytes in those functions before actually executing the read.
quoted
+ if (bitmap_git->map_pos >= bitmap_git->map_size) + BUG("bitmap position exceeds size (%"PRIuMAX" >= %"PRIuMAX")", + (uintmax_t)bitmap_git->map_pos, + (uintmax_t)bitmap_git->map_size);This uses ">=", which is good, because it is valid to walk the pointer to one past the end of the map, which is effectively EOF. But as above, in that condition the callers should be checking for this EOF state before reading the bytes.
I think that having both is useful. Checking before you read avoids undefined behavior. Checking after you seek ensures that we never have a bitmap_index struct with a bogus map_pos value. Thanks, Taylor