Re: [PATCH v7 19/31] merge-recursive: add get_directory_renames()
From: Stefan Beller <hidden>
Date: 2018-02-05 19:39:49
/*
* For
* "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
* the "e/foo.c" part is the same, we just want to know that
* "a/b/c/d" was renamed to "a/b/some/thing/else"
* so, for this example, this function returns "a/b/c/d" in
* *old_dir and "a/b/some/thing/else" in *new_dir.
*
* Also, if the basename of the file changed, we don't care. We
* want to know which portion of the directory, if any, changed.
*/
Is that better?yes, that helps me understanding pretty much.
quoted
quoted
+ * + * Also, if the basename of the file changed, we don't care. We + * want to know which portion of the directory, if any, changed. + */ + end_of_old = strrchr(old_path, '/'); + end_of_new = strrchr(new_path, '/'); + + if (end_of_old == NULL || end_of_new == NULL) + return;return early as the files are in the top level, and apparently we do not support top level renaming? What about a commit like 81b50f3ce4 (Move 'builtin-*' into a 'builtin/' subdirectory, 2010-02-22) ? Well that specific commit left many files outside the new builtin dir, but conceptually we could see a directory rename of /* => /src/*We had some internal usecases for want to support a "rename" of the toplevel directory into a subdirectory of itself. However, attempting to support that opened much too big a can of worms for me. We'd open up some big surprises somewhere.
ok, I was just trying to understand the code. (As said before, I was not asking for having this implemented.)
In particular, note that not supporting a "rename" of the toplevel
directory is a special case of not supporting a "rename" of any
directory to a subdirectory below itself, which in turn is a very
special case of excluding partial directory renames. I addressed this
in the cover letter of my original submission, as follows:
"""
Further, there's a basic question about when directory rename detection
should be applied at all. I have a simple rule:
3) If a given directory still exists on both sides of a merge, we do
not consider it to have been renamed.
Rule 3 may sound obvious at first, but it will probably arise as a
question for some users -- what if someone "mostly" moved a directory but
still left some files around, or, equivalently (from the perspective of the
three-way merge that merge-recursive performs), fully renamed a directory
in one commmit and then recreated that directory in a later commit adding
some new files and then tried to merge? See the big comment in section 4
of the new t6043 for further discussion of this rule.
"""
Patch 04/31 is the one that adds that big comment with further discussion.
Maybe there's a way to support toplevel renames, but I think it'd make
this series longer and more complicated...and may cause more edge
cases that confuse users.Thanks for reminding!
quoted
quoted
+ while (*--end_of_new == *--end_of_old && + end_of_old != old_path && + end_of_new != new_path) + ; /* Do nothing; all in the while loop */We have to compare manually as we'd want to find the first non-equal and there doesn't seem to be a good library function for that. Assuming many repos are UTF8 (including in their paths), how does this work with display characters longer than one char? It should be fine as we cut at the slash?Oh, UTF-8. Ugh. Can UTF-8 characters, other than '/', have a byte whose value matches (unsigned char)('/')? If so, then I'll need to figure out how to do utf-8 character parsing. Anyone have pointers?
Oh right we are always cutting at '/', which hex 2F, so we cannot split a codepoint in half accidentally (finding the slash inside a codepoint). And renaming a directory from one utf8 codepoint to another, which have the same prefix is not an issue at this layer, too. (Think of abc -> abd just all of abc/d in one code point, but there but even for multi code points/ASCII we repeat the prefix when printing the rename)
quoted
So the first loop is about counting the number of files in each directory that are renamed and the later while loop is about mapping them?Close; would adding the following comment at the top of the function help? /* * Typically, we think of a directory rename as all files from a * certain directory being moved to a target directory. However, * what if someone first moved two files from the original * directory in one commit, and then renamed the directory * somewhere else in a later commit? At merge time, we just know * that files from the original directory went to two different * places, and that the bulk of them ended up in the same place. * We want each directory rename to follow the bulk of the files * from that directory. This function exists to find where the * bulk of the files went. * * The first loop below simply iterates through the list of * renames, adding up all the rename source->target locations with * a count. * * The second loop compares the count for each renamed directory * and declares the "winning" target location. */ Is there any part that remains unclear with that comment? (Also, is that comment too long?)
Sounds good to me. Thanks!
quoted
quoted
+ /* Strings were xstrndup'ed before inserting into string-list, + * so ask string_list to remove the entries for us. + */comment style.Thanks.quoted
quoted
+ entry->possible_new_dirs.strdup_strings = 1;Why do we need to set strdup_strings here (so late in the game, we are about to clear it?) Could we set it earlier? Or rather have the string list duplicate the strings instead of get_renamed_dir_portion ?We didn't strdup the original string (a file path) as-is, we strndup'ed a subset of the original string (just the relevant portion of the directory). Since we already had to allocate a special string for that, making the string list duplicate the strings would have caused an extra unnecessary allocation and required us to free the memory allocated by get_renamed_dir_portion manually. So, we do need this here. Does this comment make it clearer?: /* * The relevant directory sub-portion of the original full * filepaths were xstrndup'ed before inserting into * possible_new_dirs, and instead of manually iterating the * list and free'ing each, just lie and tell * possible_new_dirs that it did the strdup'ing so that it * will free them for us. */ entry->possible_new_dirs.strdup_strings = 1; string_list_clear(&entry->possible_new_dirs, 1);
Phrased this way, it makes sense. Thanks, Stefan