Thread (4 messages) 4 messages, 3 authors, 2016-06-15

Re: Minimum git commit abbrev length (Was Re: -tip: origin tree build failure (was: [GIT PULL] ext4 update) for 2.6.37)

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: 2010-10-28 18:56:01
Also in: lkml

On Thu, Oct 28, 2010 at 11:28 AM, Linus Torvalds
[off-list ref] wrote:
Yes. The default of 7 (I think) comes from fairly early in git
development, when seven hex digits was a lot (it covers about 250+
million hash values). Back then I thought that 65k revisions was a lot
(it was what we were about to hit in BK), and each revision tends to
be about 5-10 new objects or so, so a million objects was a big
number.

These days, the kernel isn't even the largest git project, and even
the kernel has about 220k revisions (_much_ bigger than the BK tree
ever was) and we are approaching two million objects. At that point,
seven hex digits is still unique for a lot of them, but when we're
talking about just two orders of magnitude difference between number
of objects and the hash size, there _will_ be hash collisions. It's no
longer even close to unrealistic - it happens all the time.
Hmm. In fact, in the kernel, we currently have about twelve thousand
objects that end up having collisions in 7 hex digits. Even in the old
historical BK kernel tree, we have over a thousand objects that
collide (each bucket in both cases gets just two objects, there are as
of yet no multiple collisions, which is what you'd expect with a good
hash). See with

  git rev-list --objects --all | cut -c1-7 | sort | uniq -dc

and in fact git itself has a few collisions (but currently just 44
objects ending up sharing 22 SHA1 buckets in 7 digits).

With each digit, you'd expect the collisions to decrease by a factor
of 16, and that is indeed exactly what happens. For my current kernel
tree I get:

 - 7 digits: 5823 buckets with duplicates (ie 11646 objects that aren't unique)
 - 8: 406
 - 9: 30
 - 10: 1
 - 11: 0

so 12 hex digits is indeed pretty safe for the kernel, and is likely
to remain so until the kernel history grows by a factor of 16.

                        Linus
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help