[PATCH 0/3] revision: use priority queue for streaming walks
From: Kristofer Karlsson via GitGitGadget <hidden>
Date: 2026-05-27 15:50:05
This is a follow-up to kk/limit-list-optim (now in master), which replaced the O(N) sorted linked-list insertion in limit_list() with a priority queue. In the review thread for that patch, I mentioned that the same approach could be applied to the streaming (non-limited) walk in get_revision_1(). Junio suggested doing it as a separate topic, and Peff noted he had a local branch (jk/revs-commits-prio-queue) doing a broader conversion of revs->commits to prio_queue entirely. This series takes a lighter-weight approach: it keeps the linked list for setup and external callers, and adds a separate commit_queue field that the streaming walk drains into on first use. This avoids touching bisect, line-log, and list-objects code, at the cost of a "only one should be non-empty" invariant between the two fields. Together with the limit_list() change already in master, this eliminates all commit_list_insert_by_date() callers from revision.c. Patch 1 is a small leak fix -- a missing release_revisions() call in pack-objects that becomes visible once the commit queue uses a dynamically allocated prio_queue array. Patch 2 introduces a rev_walk_mode enum to replace the repeated if/else chains in get_revision_1(). The function dispatches on walk mode in multiple places (next commit, expand parents, flag clearing) and these chains must stay in sync. The enum resolves the mode once and both dispatch sites switch on the same variable. This is a lighter alternative to the vtable-based refactoring I mentioned before. No functional change. Patch 3 is the actual conversion of the streaming walk to use a priority queue. == Why this helps == The streaming walk in get_revision_1() inserts newly discovered parent commits into a date-sorted queue. On master, this uses commit_list_insert_by_date(), which walks the linked list to find the insertion point -- O(w) per insert, where w is the queue width (active walk frontier). In merge-heavy repositories, the walk frontier stays wide: Repository Commits Peak width Avg width ======================================= monorepo (2.4M) 2,420K 2,653 1,700 linux.git 1,445K 581 235 git.git 82K 188 82 On the monorepo, each of the 2.4M commits requires scanning an average of 1,700 list entries to find the insertion point. With the priority queue, this drops to ~11 heap comparisons. == Benchmarks == All benchmarks: best of 3 runs, same machine, commit-graph present. Streaming walks (affected by this series): git rev-list --count HEAD (monorepo, 2.4M commits) master: 17.94s patched: 3.38s (5.3x faster) git rev-list HEAD (monorepo, full output) master: 27.72s patched: 8.61s (2.8x faster, I/O-bound fraction unchanged) Regression checks -- non-merge-heavy repos (streaming path, but frontier stays narrow so O(w) insertion was never the bottleneck): git rev-list --count HEAD (linux.git, 1.4M commits) master: 1.76s patched: 1.81s (no change) git rev-list HEAD (linux.git, full output) master: 4.46s patched: 4.52s (no change) git rev-list --count HEAD (git.git, 82K commits) master: 83ms patched: 86ms (no change) Regression checks -- other walk modes (not affected by this series): git rev-list --count HEAD~5000...HEAD (monorepo, limited path) master: 7.36s patched: 7.02s (no change) == Profile breakdown == perf profiling of rev-list --count HEAD on the monorepo shows where the time goes: master (17.94s): commit_list_insert_by_date 79% 14.25s fixed overhead (parse/lookup) 21% 3.69s patched (3.38s): heap ops (compare + sift) 16% 0.53s fixed overhead (parse/lookup) 84% 2.85s The queue maintenance itself sped up 27x (14.25s to 0.53s). The overall 5.3x is lower because the fixed costs -- object lookup (17%), commit-graph parsing (14%), memory allocation (10%) -- are roughly constant between the two versions at ~3s. This means the patch removes the dominant bottleneck entirely. After the patch, the walk cost is dominated by irreducible per-commit work (parsing and object lookup) which scales linearly with commit count regardless of frontier width. Kristofer Karlsson (3): pack-objects: call release_revisions() after cruft traversal revision: introduce rev_walk_mode to clarify get_revision_1() revision: use priority queue for non-limited streaming walks builtin/pack-objects.c | 1 + commit.c | 13 ----- commit.h | 2 - revision.c | 113 +++++++++++++++++++++++++++-------------- revision.h | 12 ++++- 5 files changed, 88 insertions(+), 53 deletions(-) base-commit: c69baaf57ba26cf117c2b6793802877f19738b0d Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-2127%2Fspkrka%2Fstreaming-prio-queue-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-2127/spkrka/streaming-prio-queue-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/2127 -- gitgitgadget