Re: git archive generates tar with malformed pax extended attribute
From: Jeff King <hidden>
Date: 2019-06-04 20:53:51
On Sun, Jun 02, 2019 at 06:58:48PM +0200, René Scharfe wrote:
quoted
That sounds about right. It's basically every version of every tree that has a symlink. Did it make a noticeable difference in timing? Indexing the whole kernel history is already a horribly slow process. :)Right, I didn't notice a difference -- no patience for watching that thing to the end. But here are some numbers for v2.21.0 vs. master with the patch: Benchmark #1: git fsck Time (mean ± σ): 307.775 s ± 9.054 s [User: 307.173 s, System: 0.448 s] Range (min … max): 294.052 s … 322.931 s 10 runs Benchmark #2: ~/src/git/git fsck Time (mean ± σ): 319.754 s ± 2.255 s [User: 318.927 s, System: 0.671 s] Range (min … max): 316.376 s … 323.747 s 10 runs Summary 'git fsck' ran 1.04 ± 0.03 times faster than '~/src/git/git fsck'
I guess that's about what I'd expect. The bulk of the time in most repos will go to fscking the actual blobs, I'd think. But hitting each tree twice really is noticeable.
Seeing only a single CPU core being stressed for that long is a bit sad to see. Checking individual objects should be relatively easy to parallelize, shouldn't it?
Yes. The fsck code is pretty old, and uses a very simple way of walking over all of the packs. index-pack (which backs verify-pack these days) is much smarter, and runs in parallel. It still takes a lock when doing the actual fsck checks, but most of the time goes to the zlib inflation and delta reconstruction. There's some discussion in: https://public-inbox.org/git/20180816210657.GA9291@sigill.intra.peff.net/ and even some patches elsewhere in the thread here: https://public-inbox.org/git/20180902075528.GC18787@sigill.intra.peff.net/ and here: https://public-inbox.org/git/20180902085503.GA25391@sigill.intra.peff.net/ I think the big show-stopper there is how ugly it is to run the pack verification in a separate process (and I suspect it is not just ugly from a code point of view, but actively breaks index-pack because it then relies on the set of objects seen during the first phase to do its connectivity check). So there would probably need to be some lib-ification work on index-pack first, so that we could call it (at least in verification mode) multiple times from inside fsck. -Peff