Re: Slow git pack-refs --all

From: Martin Fick <hidden>
Date: 2026-01-16 20:35:43

From: Jeff King <redacted> Sent: Thursday, January 15, 2026 2:09 PM

quoted

...
And the remaining third was a bit all over the place with small sections,
the largest two of those sections being:

packed_refs_store_create ~8.7%
 unknown 4.4%
 memchr 4.4%
 page_fault 4.4%

Hmm, I don't think we have a function "packed_refs_store_create". Did
you typo while transferring the name over?

Yes, this should have been packed_ref_store_create (singular), sorry.

At any rate, we can assume this is poking through the packed-refs file
itself, looking for trailing newlines via memchr.

But why would we do that immediately when creating the packed-refs store
in memory? In modern versions of Git, we try to avoid reading the
packed-refs file as much as possible, binary-searching when we can. Of
course that means it has to be sorted, which was not something promised
by the original format. So we have a "sorted" tag that we write. E.g.,
this is from my clone of git, packed with git itself:
... 
  # pack-refs with: peeled fully-peeled sorted
...
  jgit pack-refs --all

That gives me this:

  # pack-refs with: peeled
...
 Aha! So jgit is not writing out the "sorted" tag. As a result, when git
reads the file, its logic is:

  1. Check for the sorted tag. It's not here, so...
  2. Check if the file is sorted by reading each entry linearly. If it's
    not, then...
  3. Read it all into memory and sort the result. We can then
     binary-search that (and iterate it in sorted order, which is
     important for pack-refs).

So when git reads the packed-refs file, we are ending up at least with
step 2, an extra pass through the whole file, and maybe to step 3
(depending on whether jgit actually sorts the file).

You mentioned that Gerrit writes the packed-refs file directly itself,
presumably using jgit. So it sounds like it is constantly undoing Git's
"sorted" marker, which causes git-pack-refs to spend extra effort
checking the sortedness, and rewrite the marker, which then gets hosed
again by jgit, and so on.

Agreed, this is likely the case, but not for the sorted marker, see below...

And that may explain why jgit is faster, if it is not doing the extra
sort check. If it is not even trying to maintain the sorted property
that it would be faster still (it takes one linear pass while writing
out the file, omitting entries that match our updates, and then appends
our updates at the end).

Unfortunately, this does not actually seem to be the reason.

If jgit _is_ sorting the file but not writing out the sorted marker,
then it should start doing so. ;)

Agreed, I will see to it that this gets fixed. Unfortunately, adding the 
sorted tag does not seem to speed things up. :(

If it's not sorting the file, then probably it should start doing so
(and writing the marker). This will make subsequent reads much faster
(mmap + binary-search). It shouldn't even be slower to write (assuming
jgit's writes are doing the usual "rewrite the whole thing to a tempfile
and atomic-rename into place", and not taking some shortcut by appending
to the file).

FYI, jgit does seem to order things, it does not append. The resulting output
from jgit after a repack with new refs add matches that from git for all but
the header.

Unrelated to your problem, but also jgit should support the fully-peeled
tag, another thing that makes readers faster. ;)

Ironically, this is not just related, it appears to be the trigger!!! When I add 
this tag (and not the sorted tag), the cache flushed time drops down to 
under 4s (from over 5mins)!

I will see to it that jgit fixes this too. That should help solve my problem.

That being said, it seems like something is still broken in git here 
despite this tag being missing?

Thanks so much Peff for helping get to this point!

-Martin

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help