Re: [WIP RFC PATCH v2 5/5] clone: use dir-iterator to avoid explicit dir traversal
From: Ævar Arnfjörð Bjarmason <hidden>
Date: 2019-02-26 12:50:47
On Tue, Feb 26 2019, Duy Nguyen wrote:
On Tue, Feb 26, 2019 at 12:18 PM Matheus Tavares [off-list ref] wrote:quoted
Replace usage of opendir/readdir/closedir API to traverse directories recursively, at copy_or_link_directory function, by the dir-iterator API. This simplifies the code and avoid recursive calls to copy_or_link_directory. This process also makes copy_or_link_directory call die() in case of an error on readdir or stat, inside dir_iterator_advance. Previously it would just print a warning for errors on stat and ignore errors on readdir, which isn't nice because a local git clone would end up successfully even though the .git/objects copy didn't fully succeeded. Signed-off-by: Matheus Tavares <redacted> --- I can also make the change described in the last paragraph in a separate patch before this one, but I would have to undo it in this patch because dir-iterator already implements it. So, IMHO, it would be just noise and not worthy. builtin/clone.c | 45 +++++++++++++++++++++++---------------------- 1 file changed, 23 insertions(+), 22 deletions(-)diff --git a/builtin/clone.c b/builtin/clone.c index fd580fa98d..b23ba64c94 100644 --- a/builtin/clone.c +++ b/builtin/clone.c@@ -23,6 +23,8 @@ #include "transport.h" #include "strbuf.h" #include "dir.h" +#include "dir-iterator.h" +#include "iterator.h" #include "sigchain.h" #include "branch.h" #include "remote.h"@@ -411,42 +413,37 @@ static void mkdir_if_missing(const char *pathname, mode_t mode) } static void copy_or_link_directory(struct strbuf *src, struct strbuf *dest, - const char *src_repo, int src_baselen) + const char *src_repo) { - struct dirent *de; - struct stat buf; int src_len, dest_len; - DIR *dir; - - dir = opendir(src->buf); - if (!dir) - die_errno(_("failed to open '%s'"), src->buf); + struct dir_iterator *iter; + int iter_status; + struct stat st; + unsigned flags; mkdir_if_missing(dest->buf, 0777); + flags = DIR_ITERATOR_PEDANTIC | DIR_ITERATOR_FOLLOW_SYMLINKS; + iter = dir_iterator_begin(src->buf, flags); + strbuf_addch(src, '/'); src_len = src->len; strbuf_addch(dest, '/'); dest_len = dest->len; - while ((de = readdir(dir)) != NULL) { + while ((iter_status = dir_iterator_advance(iter)) == ITER_OK) { strbuf_setlen(src, src_len); - strbuf_addstr(src, de->d_name); + strbuf_addstr(src, iter->relative_path); strbuf_setlen(dest, dest_len); - strbuf_addstr(dest, de->d_name); - if (stat(src->buf, &buf)) { - warning (_("failed to stat %s\n"), src->buf); - continue; - } - if (S_ISDIR(buf.st_mode)) { - if (!is_dot_or_dotdot(de->d_name)) - copy_or_link_directory(src, dest, - src_repo, src_baselen); + strbuf_addstr(dest, iter->relative_path); + + if (S_ISDIR(iter->st.st_mode)) { + mkdir_if_missing(dest->buf, 0777);I wonder if this mkdir_if_missing is sufficient. What if you have to create multiple directories? Let's say the first advance, we hit "a". The the second advance we hit directory "b/b/b/b", we would need to mkdir recursively and something like safe_create_leading_directories() would be a better fit. I'm not sure if it can happen though. I haven't re-read dir-iterator code carefully.
This part isn't a problem. It iterates one level at a time. So given a
structure like a/b/c/d/e/f/g/h/i/j/k/some-l you'll find that if you
instrument the loop in clone.c you get:
dir = a
dir = a/b
dir = a/b/c
dir = a/b/c/d
dir = a/b/c/d/e
dir = a/b/c/d/e/f
dir = a/b/c/d/e/f/g
dir = a/b/c/d/e/f/g/h
dir = a/b/c/d/e/f/g/h/i
dir = a/b/c/d/e/f/g/h/i/j
dir = a/b/c/d/e/f/g/h/i/j/k
dir = a/b/c/d/e/f/g/h/i/j/k/some-l
So it's like the old implementation in that way. It readdir()'s and
walks directories one level at a time.