Thread (19 messages) 19 messages, 4 authors, 2026-01-16

Re: Slow git pack-refs --all

From: Martin Fick <hidden>
Date: 2026-01-06 23:03:37

From: Jeff King <redacted> Sent: Tuesday, January 6, 2026 3:38 AM
On Mon, Jan 05, 2026 at 11:45:41PM +0000, Martin Fick wrote:
quoted
By repacking to get one used, and one cruft pack only, and no loose
objects, I have confirmed that pack-refs it is still slow. This rules out the
idea that the loose object, or pack file counts were making things slow.
OK, that is interesting. I'd still expect opening the objects to be the
dominating factor, but now the load would be on jumping around the
mmap'd packfile rather than open/read/close calls.
I believe I have confirmed this now with more testing...

By first dropping the system caches, and then catting the pack file to
/dev/null, it sped things up to under 20s!

Note that neither catting the idx, nor the packed-refs file helped to 
noticeably speed things up.
quoted
OK, after discovering the strace -r and -T options, I have determined that
the 29K writes were all very fast in themselves. However, most of the
writes seem to follow each other with no other system calls in between.
This explains why it looks like the writes are slow, even though they aren't.
quoted
If I tally up the time between the previous system call, and each write(),
it adds up to the bulk of the time (4mins out of 4m15s) that it takes to
pack refs. This tells me that no visible I/O or system calls are the problem,
but rather that the program itself is taking a long time between writes.
I very much doubt that this is heavy CPU time, but rather I am going to
guess that this is hidden system time spent accessing mmaped memory.
That would be consistent with reading object data from the packfile.
We'll jump around within the packfile to get that data.
Agreed, but boy is that really bad performance!

quoted
Could it be really slow reading the packed-refs file? I can see the
packed-refs file is mmaped() before the writes start, and then
munmapped after the writes are completed. If I had to guess, that likely
means that the packed-refs file is being read in small increments by the
kernel via mmap, and that is what is making things very slow over NFS.
The packed-refs file is mmap'd, but we'll be reading it sequentially. I
guess whether or not there is good read-ahead there may depend on the
NFS implementation.
Yeah, ruled out now by dropping the system caches, and then catting the 
packed-refs file before running git pack-refs, which did NOT help speed 
things up.

quoted
My alternative theory, is that each ref is being looked up via a binary
search, but I don't think git does this?
Git does binary search within the packed-refs file, but it shouldn't be
doing so here. The write-out phase of packing refs is a straight merge
between two lists: the existing packed-refs entries and the new entries
we are adding.
Agreed, and I should have ruled this out by realizing that this would likely
not have been affected by the system caches in my earlier tests.

I'd second Patrick's suggestion to use perf or similar to try to see
where the time is going.
Noted, thanks.

You might also try building Git with NO_MMAP. That might make the I/O
costs more apparent via strace, because they'll be coming via pread().
Agreed, I will try to do this. I think that the jgit results hint that this 
this might even eliminate most of the I/O costs (jgit is not using
MMAP in my tests). It would be nice if this were a runtime config
instead of requiring a rebuild, as some use cases might be better
with, and some without MMAP.

Thanks for all the input,

-Martin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help