Thread (3 messages) 3 messages, 3 authors, 2016-06-15

Re: Bug report about symlinks

From: Junio C Hamano <hidden>
Date: 2016-06-15 23:02:07

Possibly related (same subject, not in this thread)

René Scharfe [off-list ref] writes:
How about the patch below?  Before it checks if an index entry exists
in the work tree, it checks if its path includes a symlink.
Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.

I think you, who dug to find out where to add the check, already
know this, and I am writing this mainly for myself and for the list
archive, but when the knee-jerk "has-syjmlink-leading-path missing?"
reaction came to me, two obvious optimizations also came to my mind:

 - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c"
   is a symbolic link, we mark it as ~CE_VALID, but at the same
   time, we learn "a/b/c/any/thing" are also ~CE_VALID with that
   check, so we _could_, after the has_symlink_leading_path once
   returns true, so there may be an opportunity to fast-forward the
   scan, marking all paths under "a/b/c" as ~CE_VALID.

 - After finding out "a/b/c" is a symbolic link to cause anything
   underneath marked as ~CE_VALID, we also know "a/" and "a/b/"
   exists as real directories, so a later check for "a/b/some/thing"
   can start checking at "a/b/some/" without checking "a/" and "a/b/".

The latter is more important as it is a much more common case that
the shape of the working tree not to change.

But I think the implementation of has_symlink_leading_path() already
has these optimizations built inside around the (unfortunately named)
"struct cache_def", so it probably would not give us much boost to
implement such a fast-forwarding of the scan.
And do we need to use the threaded_ variant of the function here?
Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.

The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate->cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.  Do you mean that we may want to
make istate->cache[] scanned by multiple threads?  I am not sure.
quoted hunk
diff --git a/read-cache.c b/read-cache.c
index 5d3c8bd..6f0057f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		return ce;
 	}
 
+	if (has_symlink_leading_path(ce->name, ce_namelen(ce))) {
+		if (ignore_missing)
+			return ce;
+		if (err)
+			*err = ENOENT;
+		return NULL;
+	}
+
 	if (lstat(ce->name, &st) < 0) {
 		if (ignore_missing && errno == ENOENT)
 			return ce;
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help