Thread (50 messages) 50 messages, 4 authors, 2024-06-19

Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed

From: Patrick Steinhardt <hidden>
Date: 2024-05-17 05:00:56

On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:
From: Xing Xin <redacted>
Long time no see :)
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not being discovered, resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.

And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with

  git bundle create base.bundle main

Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with

  git bundle create incr.bundle A..main

Now we can generate a bundle list with the following content:

  [bundle]
      version = 1
      mode = all
      heuristic = creationToken

  [bundle "base"]
      uri = base.bundle
      creationToken = 1

  [bundle "incr"]
      uri = incr.bundle
      creationToken = 2

A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.

So why the `refs/bundles/main` is not discovered? After some digging I
found that:

1. when unbundling a downloaded bundle, a `verify_bundle` is called to
   check its prerequisites if any. The verify procedure would find oids
   so `packed_git` is initialized.

2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
   during which `mark_complete_and_common_ref` and `mark_tips` would
   find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be
   enlisted if `packed_git` has already initialized in 1.
And I assume we do not want it to not use `OBJECT_INFO_QUICK`?
Back to the example above, when unbunding `incr.bundle`, `base.pack` is
enlisted to `packed_git` bacause of the prerequisites to verify. Then we
can not find `B` for negotiation at a latter time bacause `B` exists in
`incr.pack` which is not enlisted in `packed_git`.
Okay, the explanation feels sensible.
This commit fixes this by adding a `reprepare_packed_git` call for every
successfully unbundled bundle, which ensures to enlist all generated
packs from bundle uri. And a set of negotiation related tests are added.
This makes me wonder though. Do we really need to call
`reprepare_packed_git()` once for every bundle, or can't we instead call
it once at the end once we have fetched all bundles? Reading on.
quoted hunk ↗ jump to hunk
Signed-off-by: Xing Xin <redacted>
---
    bundle-uri: refresh packed_git if unbundle succeed

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1730

 bundle-uri.c                |   3 +
 t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++
 2 files changed, 132 insertions(+)
diff --git a/bundle-uri.c b/bundle-uri.c
index ca32050a78f..2b9d36cfd8e 100644
--- a/bundle-uri.c
+++ b/bundle-uri.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "run-command.h"
 #include "hashmap.h"
+#include "packfile.h"
 #include "pkt-line.h"
 #include "config.h"
 #include "remote.h"
@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file)
 			       VERIFY_BUNDLE_QUIET)))
 		return 1;
 
+	reprepare_packed_git(r);
+
So what's hidden here is that `unbundle_from_file()` will also try to
access the bundle's refs right away. Surprisingly, we do so by calling
`refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the
effect that we basically accept arbitrary object IDs here even if we do
not know those. That's why we didn't have to `reprepare_packed_git()`
before this change.

Now there are two conflicting thoughts here:

  - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs
    should now be accessible.

  - Or we can avoid calling `reprepare_packed_git()` inside the loop and
    instead call it once after we have fetched all bundles.

The second one feels a bit like premature optimization to me. But the
first item does feel like it could help us to catch broken bundles
because we wouldn't end up creating refs for objects that neither we nor
the bundle have.

Patrick

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help