Thread (13 messages) 13 messages, 3 authors, 2016-06-15

Re: [PATCH v3 4/9] index-helper: new daemon for caching index and related stuff

From: Eric Sunshine <hidden>
Date: 2016-06-15 23:02:05

On Monday, July 28, 2014, Nguyễn Thái Ngọc Duy [off-list ref] wrote:
Instead of reading the index from disk and worrying about disk
corruption, the index is cached in memory (memory bit-flips happen
too, but hopefully less often). The result is faster read.

The biggest gain is not having to verify the trailing SHA-1, which
takes lots of time especially on large index files. But this also
opens doors for further optimiztions:
s/optimiztions/optimizations/
 - we could create an in-memory format that's essentially the memory
   dump of the index to eliminate most of parsing/allocation
   overhead. The mmap'd memory can be used straight away.

 - we could cache non-index info such as name hash

The shared memory's name folows the template "git-<object>-<SHA1>"
s/folows/follows/
where <SHA1> is the trailing SHA-1 of the index file. <object> is
"index" for cached index files (and may be "name-hash" for name-hash
cache). If such shared memory exists, it contains the same index
content as on disk. The content is already validated by the daemon and
git won't validate it again (except comparing the trailing SHA-1s).

Git can poke the daemon to tell it to refresh the index cache, or to
keep it alive some more minutes via UNIX signals. It can't give any
real index data directly to the daemon. Real data goes to disk first,
then the daemon reads and verifies it from there.

$GIT_DIR/index-helper.pid contains a reference to daemon process (and
it's pid on *nix). The file's mtime is updated every time it's accessed
s/it's pid/its pid/
quoted hunk ↗ jump to hunk
(or should be updated often enough). Old index-helper.pid is considered
stale and ignored.

Signed-off-by: Nguyễn Thái Ngọc Duy <redacted>
---
diff --git a/index-helper.c b/index-helper.c
new file mode 100644
index 0000000..8fa0af9
--- /dev/null
+++ b/index-helper.c
@@ -0,0 +1,157 @@
+static void share_index(struct index_state *istate, struct index_shm *is)
+{
+       void *new_mmap;
+       if (istate->mmap_size <= 20 ||
+           hashcmp(istate->sha1,
+                   (unsigned char *)istate->mmap + istate->mmap_size - 20) ||
+           !hashcmp(istate->sha1, is->sha1) ||
+           git_shm_map(O_CREAT | O_EXCL | O_RDWR, 0700, istate->mmap_size,
+                       &new_mmap, PROT_READ | PROT_WRITE, MAP_SHARED,
+                       "git-index-%s", sha1_to_hex(istate->sha1)) < 0)
Do any of these failure conditions deserve a diagnostic message to let
the user know that there was a problem?
quoted hunk ↗ jump to hunk
+               return;
+
+       free_index_shm(is);
+       is->size = istate->mmap_size;
+       is->shm = new_mmap;
+       hashcpy(is->sha1, istate->sha1);
+       memcpy(new_mmap, istate->mmap, istate->mmap_size - 20);
+
+       /*
+        * The trailing hash must be written last after everything is
+        * written. It's the indication that the shared memory is now
+        * ready.
+        */
+       hashcpy((unsigned char *)new_mmap + istate->mmap_size - 20, is->sha1);
+}
diff --git a/read-cache.c b/read-cache.c
index b679781..ff28142 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1465,6 +1467,65 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk,
        return ce;
 }

+/*
+ * Try to open and verify a cached shm index if available. Return 0 if
+ * succeeds (istate->mmap and istate->mmap_size are updated). Return
+ * negative otherwise.
+ */
+static int try_shm(struct index_state *istate)
+{
+       void *new_mmap = NULL;
+       size_t old_size = istate->mmap_size;
Is old_size needed? Can you not simply reference istate->mmap_size
directly each place old_size is mentioned?
+       ssize_t new_length;
Nit: 'size' vs. 'length' inconsistency in variable name choices.
quoted hunk ↗ jump to hunk
+       const unsigned char *sha1;
+       struct stat st;
+
+       if (old_size <= 20 ||
+           stat(git_path("index-helper.pid"), &st))
+               return -1;
+       sha1 = (unsigned char *)istate->mmap + old_size - 20;
+       new_length = git_shm_map(O_RDONLY, 0700, -1, &new_mmap,
+                                PROT_READ, MAP_SHARED,
+                                "git-index-%s", sha1_to_hex(sha1));
+       if (new_length <= 20 ||
+           hashcmp((unsigned char *)istate->mmap + old_size - 20,
+                   (unsigned char *)new_mmap + new_length - 20)) {
+               if (new_mmap)
+                       munmap(new_mmap, new_length);
+               poke_daemon(&st, 1);
+               return -1;
+       }
+       munmap(istate->mmap, istate->mmap_size);
+       istate->mmap = new_mmap;
+       istate->mmap_size = new_length;
+       poke_daemon(&st, 0);
+       return 0;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int must_exist)
 {
diff --git a/shm.c b/shm.c
new file mode 100644
index 0000000..4ec1a00
--- /dev/null
+++ b/shm.c
@@ -0,0 +1,67 @@
+#include "git-compat-util.h"
+#include "shm.h"
+
+#ifdef HAVE_SHM
+
+#define SHM_PATH_LEN 72                /* we don't create very long paths.. */
+
+ssize_t git_shm_map(int oflag, int perm, ssize_t length, void **mmap,
+                   int prot, int flags, const char *fmt, ...)
+{
+       va_list ap;
+       char path[SHM_PATH_LEN];
Is there a reason to avoid strbuf here?
+       int fd;
+
+       path[0] = '/';
+       va_start(ap, fmt);
+       vsprintf(path + 1, fmt, ap);
+       va_end(ap);
+       fd = shm_open(path, oflag, perm);
+       if (fd < 0)
+               return -1;
+       if (length > 0 && ftruncate(fd, length)) {
+               shm_unlink(path);
+               close(fd);
+               return -1;
+       }
+       if (length < 0 && !(oflag & O_CREAT)) {
+               struct stat st;
+               if (fstat(fd, &st))
+                       die_errno("unable to stat %s", path);
+               length = st.st_size;
+       }
+       *mmap = xmmap(NULL, length, prot, flags, fd, 0);
+       close(fd);
+       if (*mmap == MAP_FAILED) {
+               *mmap = NULL;
+               shm_unlink(path);
+               return -1;
+       }
+       return length;
+}
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help