Thread (7 messages) 7 messages, 3 authors, 2021-07-26

Re: [PATCH 2/2] cache-tree: prefetch in partial clone read-tree

From: Elijah Newren <hidden>
Date: 2021-07-23 21:34:48

On Fri, Jul 23, 2021 at 11:55 AM Jonathan Tan [off-list ref] wrote:
"git read-tree" checks the existence of the blobs referenced by the
given tree, but does not bulk prefetch them. Add a bulk prefetch.

The lack of prefetch here was noticed at $DAYJOB during a merge
involving some specific commits, but I couldn't find a minimal merge
that didn't also trigger the prefetch in check_updates() in
unpack-trees.c (and in all these cases, the lack of prefetch in
cache-tree.c didn't matter because all the relevant blobs would have
already been prefetched by then). This is why I used read-tree here to
exercise this code path.
Okay, you have me stumped, I can't figure out what kind of a merge
would bypass the check_updates() in unpack-trees.c either.  I was
curious about octopus or merge.autostash, but I just can't trigger it.

Using read-tree to trigger the case makes perfect sense, though.
quoted hunk ↗ jump to hunk
Signed-off-by: Jonathan Tan <redacted>
---
 cache-tree.c                       | 11 ++++++++--
 t/t1022-read-tree-partial-clone.sh | 33 ++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+), 2 deletions(-)
 create mode 100755 t/t1022-read-tree-partial-clone.sh
diff --git a/cache-tree.c b/cache-tree.c
index 45e58666af..9ba2c7c6b2 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -237,6 +237,11 @@ int cache_tree_fully_valid(struct cache_tree *it)
        return 1;
 }

+static int must_check_existence(const struct cache_entry *ce)
+{
+       return !(has_promisor_remote() && ce_skip_worktree(ce));
+}
+
 static int update_one(struct cache_tree *it,
                      struct cache_entry **cache,
                      int entries,
@@ -378,8 +383,7 @@ static int update_one(struct cache_tree *it,
                }

                ce_missing_ok = mode == S_IFGITLINK || missing_ok ||
-                       (has_promisor_remote() &&
-                        ce_skip_worktree(ce));
+                       !must_check_existence(ce);
                if (is_null_oid(oid) ||
                    (!ce_missing_ok && !has_object_file(oid))) {
                        strbuf_release(&buffer);
@@ -466,6 +470,9 @@ int cache_tree_update(struct index_state *istate, int flags)
        if (!istate->cache_tree)
                istate->cache_tree = cache_tree();

+       if (!(flags & WRITE_TREE_MISSING_OK) && has_promisor_remote())
+               prefetch_cache_entries(istate, must_check_existence);
+
Nice that the fix is so simple.
quoted hunk ↗ jump to hunk
        trace_performance_enter();
        trace2_region_enter("cache_tree", "update", the_repository);
        i = update_one(istate->cache_tree, istate->cache, istate->cache_nr,
diff --git a/t/t1022-read-tree-partial-clone.sh b/t/t1022-read-tree-partial-clone.sh
new file mode 100755
index 0000000000..a763e27c7d
--- /dev/null
+++ b/t/t1022-read-tree-partial-clone.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+test_description='git read-tree in partial clones'
+
+TEST_NO_CREATE_REPO=1
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+. ./test-lib.sh
+
+test_expect_success 'read-tree in partial clone prefetches in one batch' '
+       test_when_finished "rm -rf server client trace" &&
+
+       git init server &&
+       echo foo >server/one &&
+       echo bar >server/two &&
+       git -C server add one two &&
+       git -C server commit -m "initial commit" &&
+       TREE=$(git -C server rev-parse HEAD^{tree}) &&
+
+       git -C server config uploadpack.allowfilter 1 &&
+       git -C server config uploadpack.allowanysha1inwant 1 &&
+       git clone --bare --filter=blob:none "file://$(pwd)/server" client &&
+       GIT_TRACE_PACKET="$(pwd)/trace" git -C client read-tree $TREE &&
+
+       # "done" marks the end of negotiation (once per fetch). Expect that
+       # only one fetch occurs.
+       grep "fetch> done" trace >donelines &&
+       test_line_count = 1 donelines
+'
+
+test_done
--
2.32.0.432.gabb21c7263-goog
Any reason for preferring GIT_TRACE_PACKET over GIT_TRACE2_PERF and
looking for the reported fetch_count (or even the number of
fetch_count lines)?  Just curious.

Anyway, looks good to me.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help