Re: [PATCH] commit: avoid parent list buildup in clear_commit_marks_many()
From: Patrick Steinhardt <hidden>
Date: 2025-02-17 06:33:13
On Thu, Feb 13, 2025 at 10:38:51PM +0100, René Scharfe wrote:
Am 12.02.25 um 08:05 schrieb Patrick Steinhardt:quoted
And this is fixed by immediately processing all commits that we currently have in the list. As `clear_commit_marks_1()` only processes immediate children of the handed-in commit we know that it will have processed the first parent, and `list` will contain remaining commits, if any.clear_commit_marks_1() processes the whole ancestral chain of first parents down to the root or first clean ancestor.quoted
We also end up adding grandchildren to the list, so this change essentially converts the algorithm from depth-first to breadth-first.It's still depth-first, but all second and higher numbered parents added to the list are cleaned before starting to clean the next commit from the input list. So we go from "clean straight down for each input commit first and clean their side branches later" to "clean all ancestors for each input commit".
Ah, fair.
quoted
I bet we can construct cases where this will perform _worse_ than the current algorithm, e.g. when you have branch thickets where every commit is a merge: But I assume that for the most common cases this might be an improvement indeed.I won't bet, but I'd like to see such a case. Can't imagine one.quoted
The question to me is: does this actually matter in the real world? It would be nice to maybe get some numbers that demonstrate the improvement in a repository.Well, the maximum list length for clear_commit_marks_many() calls with nr > 1 in the test suite goes from 12 in t6600 to 4 with the patch. Not that exciting. The question to me is: Why pile up parents in the list when we can clean them earlier with no downside? Or is there any?
If it really is without downsides then yes, it's a nice improvement. I was mostly wondering whether you're fixing something where the old way of doing things performs _significantly_ worse and where the change could lead to a user-visible improvement. Like, requiring us to store orders of magnitudes less commits at the same time. Patrick