Re: [PATCH 0/2] Per-commit filter proof of concept

We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Stefan Beller <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Stefan Beller <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-09
Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Derrick Stolee <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Derrick Stolee <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 1/3] initial tree-bitmap proof of concept · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters · Derrick Stolee <hidden> · 2018-10-11
Re: Bloom Filters · Jeff King <hidden> · 2018-10-11
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 0/4] Bloom filter experiment · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 1/4] Add a (very) barebones Bloom filter implementation · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · SZEDER Gábor <hidden> · 2018-10-09
Re: [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · Jeff King <hidden> · 2018-10-09
Re: [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics · SZEDER Gábor <hidden> · 2018-10-09
Re: [PATCH 0/4] Bloom filter experiment · Derrick Stolee <hidden> · 2018-10-09
[PATCH 0/2] Per-commit filter proof of concept · Jonathan Tan <hidden> · 2018-10-11
[PATCH 1/2] One filter per commit · Jonathan Tan <hidden> · 2018-10-11
Re: [PATCH 1/2] One filter per commit · Derrick Stolee <hidden> · 2018-10-11
[PATCH] Per-commit and per-parent filters for 2 parents · Jonathan Tan <hidden> · 2018-10-11
[PATCH 2/2] Only make bloom filter for first parent · Jonathan Tan <hidden> · 2018-10-11
Re: [PATCH 0/2] Per-commit filter proof of concept · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-11
Re: [PATCH 0/4] Bloom filter experiment · Derrick Stolee <hidden> · 2018-10-15
Re: [PATCH 0/4] Bloom filter experiment · Jonathan Tan <hidden> · 2018-10-16
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Duy Nguyen <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Duy Nguyen <hidden> · 2018-10-03
[RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-04
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-10-11 07:37:46

On Thu, Oct 11 2018, Jonathan Tan wrote:

Using per-commit filters and restricting the bloom filter to a single
parent increases the relative power of the filter in omitting tree
inspections compared to the original (107/53096 vs 1183/66459), but the
lack of coverage w.r.t. the non-first parents had a more significant
effect than I thought (1.29s vs .24s). It might be best to have one
filter for each (commit, parent) pair (or, at least, the first two
parents of each commit - we probably don't need to care that much about
octopus merges) - this would take up more disk space than if we only
store filters for the first parent, but is still less than the original
example of storing information for all commits in one filter.

There are more possibilities like dynamic filter sizing, different
hashing, and hashing to support wildcard matches, which I haven't looked
into.

Another way to deal with that is to havet the filter store change since
the merge base, from an E-Mail of mine back in May[1] when this was
discussed:

    From: Ævar Arnfjörð Bjarmason [off-list ref]
    Date: Fri, 04 May 2018 22:36:07 +0200
    Message-ID: [ref] (raw)

    On Fri, May 04 2018, Jakub Narebski wrote:

    (Just off-the cuff here and I'm surely about to be corrected by
    Derrick...)

    > * What to do about merge commits, and octopus merges in particular?
    >   Should Bloom filter be stored for each of the parents?  How to ensure
    >   fast access then (fixed-width records) - use large edge list?

    You could still store it fixed with, you'd just say that if you
    encounter a merge with N parents the filter wouldn't store files changed
    in that commit, but rather whether any of the N (including the merge)
    had changes to files as of the their common merge-base.

    Then if they did you'd need to walk all sides of the merge where each
    commit would also have the filter to figure out where the change(s)
    was/were, but if they didn't you could skip straight to the merge base
    and keep walking.
    [...]

Ideas are cheap and I don't have any code to back that up, just thought
I'd mention it if someone found it interesting.

Thinking about this again I wonder if something like that could be
generalized more, i.e. in the abstract the idea is really whether we can
store a filter for N commits so we can skip across N in the walk as an
optimization, doing this for merges is just an implementation detail.

So what if the bloom filters were this sort of structure:

    <commit_the_filter_is_for> = [<bloom bitmap>, <next commit with filter>]

So e.g. given a history like ("-> " = parent relationship)

    A -> B
    B -> C
    C -> D
    E -> F

We could store:

    A -> B [<bloom bitmap for A..D>, D]
    B -> C
    C -> D
    D -> E [<bloom bitmap for D..F>, F]
    E -> F
    F -> G [<bloom bitmap for F..G>, G]

Note how the bitmaps aren't evenly spaced. That's because some algorithm
would have walked the graph and e.g. decided that from A..D we had few
enough changes that the bitmap should apply for 4 commits, and then 3
for the next set etc. Whether some range was worth extending could just
be a configurable implementation detail.

1. https://public-inbox.org/git/87h8nnxio8.fsf@evledraar.gmail.com/

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help