Thread (5 messages) 5 messages, 4 authors, 2023-06-23

Re: [PATCH v4 1/4] gitformat-commit-graph: describe version 2 of BDAT

From: Derrick Stolee <hidden>
Date: 2023-06-23 13:06:18

On 6/22/2023 6:26 PM, Jonathan Tan wrote:
Taylor Blau [off-list ref] writes:
quoted
quoted
I wonder if we want to mention what the undesired misbehaviour the
"bug" causes and what we do to avoid getting affected by the bug
here.  If we can say something like "When querying for a pathname
with a byte with high-bit set, the buggy filter may produce false
negative, making the filter unusable, but asking for a pathname
without such a byte produces no false negatives (even though we may
get false positives).  When Git reads version 1 filter data, it
refrains from using it for processing paths with high-bit set to
avoid triggering the bug", then it would be ideal.
Your description of the bug matches my understanding of the issue, that
a corrupt filter would produce false negatives and thus be unusable.

I skimmed through the rest of the series, and couldn't find a spot where
we do the latter, i.e. still use v1 filters as long as we don't have any
characters in the path with high-order bits set.

I think this would be as simple as modifying the Bloom filter query
function to return "maybe" before even trying to hash a path with at
least one character with its high-bit set.

Apologies if this functionality is implemented and I just missed it.

Thanks,
Taylor
Thanks for the suggestion - yeah, this might work.
If I understand the situation correctly, the high bits can make the
hashes "not very random" but they are still effective at identifying
the "maybe" case consistently for the inputs it is given (it would not
present a "no" when it should not, but it might say "maybe" more often
than it should). The behavior is only incorrect if the same commit-graph
file is used with two different Git versions that were compiled with
different signed-ness.

If that is the case, then ignoring the Bloom filters when you see a
high bit would change the performance implication from "probably
slower" to "definitely slower" but not affect the correctness in a
system that doesn't have competing Git versions with different
compiler semantics.

That is to say, doing this extra work doesn't seem to be critical to
making this change. The ROI seems too low.

Thanks,
-Stolee
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help