Thread (61 messages) 61 messages, 5 authors, 2018-02-16

Re: [PATCH v7 17/31] merge-recursive: add a new hashmap for storing directory renames

From: Elijah Newren <hidden>
Date: 2018-02-03 21:34:39

On Fri, Feb 2, 2018 at 4:26 PM, Stefan Beller [off-list ref] wrote:
On Tue, Jan 30, 2018 at 3:25 PM, Elijah Newren [off-list ref] wrote:
quoted
+static void dir_rename_init(struct hashmap *map)
+{
+       hashmap_init(map, (hashmap_cmp_fn) dir_rename_cmp, NULL, 0);
See 55c965f3a2 (Merge branch 'sb/hashmap-cleanup', 2017-08-11) or rather
201c14e375 (attr.c: drop hashmap_cmp_fn cast, 2017-06-30) for an attempt to
keep out the casting in the init, but cast the void->type inside the function.
Fixed (locally).
quoted
+struct dir_rename_entry {
+       struct hashmap_entry ent; /* must be the first member! */
+       char *dir;
+       unsigned non_unique_new_dir:1;
+       struct strbuf new_dir;
+       struct string_list possible_new_dirs;
+};
Could you add comments what these are and if they have any constraints?
e.g. is 'dir' the full path from the root of the repo and does it end
with a '/' ?
Good idea, will do.  To help you out with the rest of your review:

yes, dir is full path from the root of the repo, but no it does not
end with a '/'.  The same two constraints hold for how I store
directories in other variables as well.

If someone renames multiple files in a single directory to several
different other directories, then possible_new_dirs is used
(temporarily) to track what those other directories are and how many
files were moved from `dir` to that directory.  The 'winner' (the
target directory with the most renames from `dir` to it) will be
recorded in new_dir.  If there is no 'winner' (a tie for target
directory with the most renames), then non_unique_new_dir is set.
Once we can set either non_unique_new_dir or possible_new_dirs, then
possible_new_dirs is unnecessary and can be freed.
Having a stringlist of potentially new dirs sounds like the algorithm is
at least n^2, but how do I know? I'll read on.
Yes, I suppose it's technically n^2, but n is expected to be O(1).
While one can trivially construct a case making n arbitrarily large,
statistically for real world repositories, I expected the mode of n to
be 1 and the mean to be less than 2.  My original idea was to use a
hash for possible_new_dirs, but since hashes are so painful in C and n
should be very small anyway, I didn't bother.  If anyone can find an
example of a real world open source repository (linux, webkit, git,
etc.) with a merge where n is greater than about 10, I'll be
surprised.

Does that address your concern, or does it sound like I'm kicking the
can down the road?  If it's the latter, we can switch it out.
This patch only adds static functions, so some compilers may even refuse
to compile after this patch (-Werror -Wunused). I am not sure how strict we
are there, but as git-bisect still hasn't learned about going only
into the first
parent to have bisect working on a "per series" level rather than a "per commit"
level, it is possible that someone will end up on this commit in the future and
it won't compile well. :/

Not sure what to recommend, maybe squash this with the patch that makes
use of these functions?
I'm unsure as well; I can combine it with 19/31 but I thought that
keeping them separate made it slightly easier to review both.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help