Re: We should add a "git gc --auto" after "git clone" due to commit graph

We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Stefan Beller <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Stefan Beller <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-08
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-09
Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Derrick Stolee <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Derrick Stolee <hidden> · 2018-10-09
Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph) · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 1/3] initial tree-bitmap proof of concept · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 2/3] test-tree-bitmap: add "dump" mode · Jeff King <hidden> · 2018-10-09
[PoC -- do not apply 3/3] test-tree-bitmap: replace ewah with custom rle encoding · Jeff King <hidden> · 2018-10-09
Re: Bloom Filters · Derrick Stolee <hidden> · 2018-10-11
Re: Bloom Filters · Jeff King <hidden> · 2018-10-11
Re: We should add a "git gc --auto" after "git clone" due to commit graph · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 0/4] Bloom filter experiment · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 1/4] Add a (very) barebones Bloom filter implementation · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · SZEDER Gábor <hidden> · 2018-10-09
Re: [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · Jeff King <hidden> · 2018-10-09
Re: [PATCH 2/4] commit-graph: write a Bloom filter containing changed paths for each commit · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 3/4] revision.c: use the Bloom filter to speed up path-limited revision walks · SZEDER Gábor <hidden> · 2018-10-09
[PATCH 4/4] revision.c: add GIT_TRACE_BLOOM_FILTER for a bit of statistics · SZEDER Gábor <hidden> · 2018-10-09
Re: [PATCH 0/4] Bloom filter experiment · Derrick Stolee <hidden> · 2018-10-09
[PATCH 0/2] Per-commit filter proof of concept · Jonathan Tan <hidden> · 2018-10-11
[PATCH 1/2] One filter per commit · Jonathan Tan <hidden> · 2018-10-11
Re: [PATCH 1/2] One filter per commit · Derrick Stolee <hidden> · 2018-10-11
[PATCH] Per-commit and per-parent filters for 2 parents · Jonathan Tan <hidden> · 2018-10-11
[PATCH 2/2] Only make bloom filter for first parent · Jonathan Tan <hidden> · 2018-10-11
Re: [PATCH 0/2] Per-commit filter proof of concept · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-11
Re: [PATCH 0/4] Bloom filter experiment · Derrick Stolee <hidden> · 2018-10-15
Re: [PATCH 0/4] Bloom filter experiment · Jonathan Tan <hidden> · 2018-10-16
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Duy Nguyen <hidden> · 2018-10-03
Re: We should add a "git gc --auto" after "git clone" due to commit graph · Duy Nguyen <hidden> · 2018-10-03
[RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-04
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Derrick Stolee <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Ævar Arnfjörð Bjarmason <hidden> · 2018-10-05
Re: [RFC PATCH] We should add a "git gc --auto" after "git clone" due to commit graph · Jeff King <hidden> · 2018-10-05

From: SZEDER Gábor <hidden>
Date: 2018-10-08 18:10:23

On Mon, Oct 08, 2018 at 12:57:34PM -0400, Derrick Stolee wrote:

On 10/8/2018 12:41 PM, SZEDER Gábor wrote:

quoted

On Wed, Oct 03, 2018 at 03:18:05PM -0400, Jeff King wrote:

quoted

I'm still excited about the prospect of a bloom filter for paths which
each commit touches. I think that's the next big frontier in getting
things like "git log -- path" to a reasonable run-time.

There is certainly potential there.  With a (very) rough PoC
experiment, a 8MB bloom filter, and a carefully choosen path I can
achieve a nice, almost 25x speedup:

  $ time git rev-list --count HEAD -- t/valgrind/valgrind.sh
  6

  real    0m1.563s
  user    0m1.519s
  sys     0m0.045s

  $ time GIT_USE_POC_BLOOM_FILTER=y ~/src/git/git rev-list --count HEAD -- t/valgrind/valgrind.sh
  6

  real    0m0.063s
  user    0m0.043s
  sys     0m0.020s

  bloom filter total queries: 16269 definitely not: 16195 maybe: 74 false positives: 64 fp ratio: 0.003934

Nice! These numbers make sense to me, in terms of how many TREESAME queries
we actually need to perform for such a query.

Yeah...  because you didn't notice that I deliberately cheated :)

As it turned out, it's not just about the number of diff queries that
we can spare, but, for the speedup _ratio_, it's more about how
expensive those diff queries are.

git.git has a rather flat hierarchy, and 't/' is the 372th entry in
the current root tree object, while 'valgrind/' is the 923th entry,
and the diff machinery spends considerable time wading through the
previous entries.  Notice the "carefully chosen path" remark in my
previous email; I think this particular path has the highest number of
preceeding tree entries, and, in addition, 't/' changes rather
frequently, so the diff machinery often has to scan two relatively big
tree objects.  Had I chosen 'Documentation/RelNotes/1.5.0.1.txt'
instead, i.e. another path two directories deep, but whose leading
path components are both near the beginning of the tree objects, the
speedup would be much less impressive: 0.282s vs. 0.049s, i.e. "only"
~5.7x instead of ~24.8x.

quoted

But I'm afraid it will take a while until I get around to turn it into
something presentable...

Do you have the code pushed somewhere public where one could take a look? I
Do you have the code pushed somewhere public where one could take a 
look? I could provide some early feedback.

Nah, definitely not...  I know full well how embarassingly broken this
implementation is, I don't need others to tell me that ;)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help