Thread (6 messages) 6 messages, 3 authors, 2018-05-14

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2018-05-04 20:36:14

On Fri, May 04 2018, Jakub Narebski wrote:

(Just off-the cuff here and I'm surely about to be corrected by
Derrick...)
* What to do about merge commits, and octopus merges in particular?
  Should Bloom filter be stored for each of the parents?  How to ensure
  fast access then (fixed-width records) - use large edge list?
You could still store it fixed with, you'd just say that if you
encounter a merge with N parents the filter wouldn't store files changed
in that commit, but rather whether any of the N (including the merge)
had changes to files as of the their common merge-base.

Then if they did you'd need to walk all sides of the merge where each
commit would also have the filter to figure out where the change(s)
was/were, but if they didn't you could skip straight to the merge base
and keep walking.

Which, on the topic of what else a commit graph could store: A mapping
from merge commits of N parents to the merge-base of those commits.

You could also store nothing for merges (or only files the merge itself
changed v.s. its parents). Derrick talked about how the bloom filter
implementation has a value that's "Didn't compute (for whatever reason),
look at it manually".
* Then there is problem of rename and copying detection - I think we can
  simply ignore it: unless someone has an idea about how to handle it?

  Though this means that "git log --follow <file>" wouldn't get any
  speedup, and neither the half of "git gui blame" that runs "git blame
  --incremental -C -C -w" -- the one that allows code copying and
  movement detection.
Couldn't the bloom filter also speed up --follow if you did two passes
through the history? The first to figure out all files that ever changed
names, and then say you did `--follow sha1-name.c` on git.git. The
filter would have had all the bits for both sha1_name.c and sha1-name.c
set on all commits that touched either for all of the history.

Of course this would only work with a given default value of -M<n>, but
on the assumption that most users left it at the default, and
furthermore that renames weren't so common as to make the filter useless
with too many false-positives as a result, it might be worth it. If you
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help