Thread (6 messages) 6 messages, 2 authors, 2025-02-23

Re: [PATCH] commit: avoid parent list buildup in clear_commit_marks_many()

From: René Scharfe <hidden>
Date: 2025-02-13 21:38:59

Am 12.02.25 um 08:05 schrieb Patrick Steinhardt:
On Sun, Feb 09, 2025 at 11:41:15AM +0100, René Scharfe wrote:
quoted
clear_commit_marks_1() clears the marks of the first parent and its
first parent and so on, and saves the higher numbered parents in a list
for later.  There is no benefit in keeping that list growing with each
handled commit.  Clear it after each run to reduce peak memory usage.
Okay. So the issue here is that `clear_commit_marks_1()` only processes
the first-parent chain of commits, and any other commits will be added
to the `struct commit_list` backlog. Consequently, merge-heavy histories
are very likely to build up quite a backlog of non-first-parent commits.
quoted
Signed-off-by: René Scharfe <redacted>
---
 commit.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/commit.c b/commit.c
index 540660359d..6efdb03997 100644
--- a/commit.c
+++ b/commit.c
@@ -780,14 +780,14 @@ static void clear_commit_marks_1(struct commit_list **plist,

 void clear_commit_marks_many(size_t nr, struct commit **commit, unsigned int mark)
 {
-	struct commit_list *list = NULL;
-
 	for (size_t i = 0; i < nr; i++) {
+		struct commit_list *list = NULL;
+
 		clear_commit_marks_1(&list, *commit, mark);
+		while (list)
+			clear_commit_marks_1(&list, pop_commit(&list), mark);
 		commit++;
 	}
-	while (list)
-		clear_commit_marks_1(&list, pop_commit(&list), mark);
 }
And this is fixed by immediately processing all commits that we
currently have in the list. As `clear_commit_marks_1()` only processes
immediate children of the handed-in commit we know that it will have
processed the first parent, and `list` will contain remaining commits,
if any.
clear_commit_marks_1() processes the whole ancestral chain of first
parents down to the root or first clean ancestor.
We also end up adding grandchildren to the list, so this change
essentially converts the algorithm from depth-first to breadth-first.
It's still depth-first, but all second and higher numbered parents added
to the list are cleaned before starting to clean the next commit from
the input list.  So we go from "clean straight down for each input
commit first and clean their side branches later" to "clean all
ancestors for each input commit".
I bet we can construct cases where this will perform _worse_ than the
current algorithm, e.g. when you have branch thickets where every commit
is a merge: But I assume that for the most common cases this might be an
improvement indeed.
I won't bet, but I'd like to see such a case.  Can't imagine one.
The question to me is: does this actually matter in the real world? It
would be nice to maybe get some numbers that demonstrate the improvement
in a repository.
Well, the maximum list length for clear_commit_marks_many() calls with
nr > 1 in the test suite goes from 12 in t6600 to 4 with the patch.  Not
that exciting.  The question to me is: Why pile up parents in the list
when we can clean them earlier with no downside?  Or is there any?

René
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help