Re: [PATCH v4 1/4] gitformat-commit-graph: describe version 2 of BDAT
From: Taylor Blau <hidden>
Date: 2023-06-21 12:08:47
On Tue, Jun 13, 2023 at 02:58:24PM -0700, Junio C Hamano wrote:
"bloom" -> "Bloom", probably, as the name comes from the name of its inventor (just like we spell "Boolean", not "boolean").
Indeed.
quoted
+ when char is signed and the repository has path names that have characters >= + 0x80; Git supports reading and writing them, but this ability will be removed + in a future version of Git.Makes sense. I wonder if we want to mention what the undesired misbehaviour the "bug" causes and what we do to avoid getting affected by the bug here. If we can say something like "When querying for a pathname with a byte with high-bit set, the buggy filter may produce false negative, making the filter unusable, but asking for a pathname without such a byte produces no false negatives (even though we may get false positives). When Git reads version 1 filter data, it refrains from using it for processing paths with high-bit set to avoid triggering the bug", then it would be ideal.
Your description of the bug matches my understanding of the issue, that a corrupt filter would produce false negatives and thus be unusable. I skimmed through the rest of the series, and couldn't find a spot where we do the latter, i.e. still use v1 filters as long as we don't have any characters in the path with high-order bits set. I think this would be as simple as modifying the Bloom filter query function to return "maybe" before even trying to hash a path with at least one character with its high-bit set. Apologies if this functionality is implemented and I just missed it. Thanks, Taylor