Thread (15 messages) 15 messages, 2 authors, 2025-02-15

Re: [GSoC][PATCH] merge-recursive: optimize string_list construction

From: Elijah Newren <hidden>
Date: 2025-02-11 20:59:38

On Tue, Feb 11, 2025 at 11:43 AM Meet Soni [off-list ref] wrote:
Avoid O(n^2) complexity when building a sorted `string_list` by
constructing it unsorted and sorting it afterward, reducing the
complexity to O(n log n).
I'm tempted to say merge-recursive.[ch] is nearly dead and planned for
removal, so there's not much value in messing with it, but...it's not
dead yet, so I guess this is worthwhile.
quoted hunk ↗ jump to hunk
Signed-off-by: Meet Soni <redacted>
---
 merge-recursive.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 5dfaf32b2c..c43b79e6ef 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -2757,24 +2757,18 @@ static int process_renames(struct merge_options *opt,
        struct string_list b_by_dst = STRING_LIST_INIT_NODUP;
        const struct rename *sre;

-       /*
-        * FIXME: As string-list.h notes, it's O(n^2) to build a sorted
-        * string_list one-by-one, but O(n log n) to build it unsorted and
-        * then sort it.  Note that as we build the list, we do not need to
-        * check if the existing destination path is already in the list,
-        * because the structure of diffcore_rename guarantees we won't
-        * have duplicates.
-        */
        for (i = 0; i < a_renames->nr; i++) {
                sre = a_renames->items[i].util;
-               string_list_insert(&a_by_dst, sre->pair->two->path)->util
+               string_list_append(&a_by_dst, sre->pair->two->path)->util
                        = (void *)sre;
        }
        for (i = 0; i < b_renames->nr; i++) {
                sre = b_renames->items[i].util;
-               string_list_insert(&b_by_dst, sre->pair->two->path)->util
+               string_list_append(&b_by_dst, sre->pair->two->path)->util
                        = (void *)sre;
        }
+       string_list_sort(&a_by_dst);
+       string_list_sort(&b_by_dst);
If the original source had duplicates, this would change behavior (the
insert function checks for duplicates while append does not).
Granted, the comment above the block points out why there aren't
duplicates, but will that be obvious to future readers now that you've
removed the whole comment?

Also, are the sources already sorted?  If so, we can avoid the manual
sort calls at the end, and drop this from O(n log n) to O(n).  Digging
through the code...it appears these are setup in get_renames() and are
sorted but by pair->one->path rather than pair->two->path, so we do
need the sorts here.

Of course, get_renames() itself utilizes string_list_insert() rather
than string_list_append. and there are a number of other
string_list_insert calls in the code (though some of the others might
be hard to restructure) -- perhaps the first line of your commit
message should have a "in process_renames" qualifier, since it's only
addressing one case?

Anyway, other than perhaps tweaking the first line of the commit
message, and not removing the whole comment, the patch looks good to
me.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help