Re: [PATCH 12/20] commit-graph: check size of commit data chunk
From: Taylor Blau <hidden>
Date: 2023-10-11 18:46:34
On Mon, Oct 09, 2023 at 05:05:36PM -0400, Jeff King wrote:
quoted hunk ↗ jump to hunk
We expect a commit-graph file to have a fixed-size data record for each commit in the file (and we know the number of commits to expct from the size of the lookup table). If we encounter a file where this is too small, we'll look past the end of the chunk (and possibly even off the mapped memory). We can fix this by checking the size up front when we record the pointer. The included test doesn't segfault, since it ends up reading bytes from another chunk. But it produces nonsense results, since the values it reads are garbage. Our test notices this by comparing the output to a non-corrupted run of the same command (and of course we also check that the expected error is printed to stderr). Signed-off-by: Jeff King <redacted> --- commit-graph.c | 12 +++++++++++- t/t5318-commit-graph.sh | 9 +++++++++ 2 files changed, 20 insertions(+), 1 deletion(-)diff --git a/commit-graph.c b/commit-graph.c index 472332f603..9b80bbd75b 100644 --- a/commit-graph.c +++ b/commit-graph.c@@ -340,6 +340,16 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, return 0; } +static int graph_read_commit_data(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH)
Should this be guarded with an st_mult? I think that GRAPH_DATA_WIDTH is defined as (the_hash_algo->rawsz + 16), so I *think* that the expression in the parenthesis would get done as a size_t, and then g->num_commits would be widened to a size_t for the purposes of evaluating this expression. So I think that this is all OK in the sense that we'd never underflow the 64-bit space, and having more than 2^64-1/36 (some eighteen quintillion) commits is... unlikely ;-). But it may be worth wrapping this computation in an st_mult() anyway to avoid future readers having to think about this.
quoted hunk ↗ jump to hunk
+ return error("commit-graph commit data chunk is wrong size"); + g->chunk_commit_data = chunk_start; + return 0; +} + static int graph_read_bloom_data(const unsigned char *chunk_start, size_t chunk_size, void *data) {@@ -422,7 +432,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); - pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); + read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
Here again would be a good use-case for a `pair_chunk_expect()` function, but I don't want to beat a dead horse ;-). Thanks, Taylor