Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
From: Karthik Nayak <hidden>
Date: 2024-05-17 07:36:55
"blanet via GitGitGadget" [off-list ref] writes:
From: Xing Xin <redacted>>
When using the bundle-uri mechanism with a bundle list containing
multiple interrelated bundles, we encountered a bug where tips from
downloaded bundles were not being discovered, resulting in rather slow
clones. This was particularly problematic when employing the heuristic
`creationTokens`.
And this is easy to reproduce. Suppose we have a repository with a
single branch `main` pointing to commit `A`, firstly we create a base
bundle with
git bundle create base.bundle main
Then let's add a new commit `B` on top of `A`, so that an incremental
bundle for `main` can be created with
git bundle create incr.bundle A..main
Now we can generate a bundle list with the following content:
[bundle]
version = 1
mode = all
heuristic = creationToken
[bundle "base"]
uri = base.bundle
creationToken = 1
[bundle "incr"]
uri = incr.bundle
creationToken = 2
A fresh clone with the bundle list above would give the expected
`refs/bundles/main` pointing at `B` in new repository, in other words we
already had everything locally from the bundles, but git would still
download everything from server as if we got nothing.
So why the `refs/bundles/main` is not discovered? After some digging I
found that:
1. when unbundling a downloaded bundle, a `verify_bundle` is called tos/a//
check its prerequisites if any. The verify procedure would find oids so `packed_git` is initialized.
So the flow is: 1. `fetch_bundle_list` fetches all the bundles advertised via `download_bundle_list` to local files. 2. It then calls `unbundle_all_bundles` to unbundle all the bundles. 3. Each bundle is then unbundled using `unbundle_from_file`. 4. Here, we first read the bundle header to get all the prerequisites for the bundle, this is done in `read_bundle_header`. 5. Then we call `unbundle`, which calls `verify_bundle` to ensure that the repository does indeed contain the prerequisites mentioned in the bundle. Then it creates the index from the bundle file. So because the objects are being checked, the `prepare_packed_git` function is eventually called, which means that the `raw_object_store->packed_git` data gets filled in and `packed_git_initialized` is set. This means consecutive calls to `prepare_packed_git` doesn't re-initiate `raw_object_store->packed_git` since `packed_git_initialized` already is set. So your explanation makes sense, as a _nit_ I would perhaps add the part about why consecutive calls to `prepare_packed_git` are ineffective.
2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,
s/unbundled/unbundling
during which `mark_complete_and_common_ref` and `mark_tips` would find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be enlisted if `packed_git` has already initialized in 1. Back to the example above, when unbunding `incr.bundle`, `base.pack` is enlisted to `packed_git` bacause of the prerequisites to verify. Then we can not find `B` for negotiation at a latter time bacause `B` exists in `incr.pack` which is not enlisted in `packed_git`. This commit fixes this by adding a `reprepare_packed_git` call for every successfully unbundled bundle, which ensures to enlist all generated packs from bundle uri. And a set of negotiation related tests are added.
The solution makes sense.
quoted hunk ↗ jump to hunk
Signed-off-by: Xing Xin <redacted> --- bundle-uri: refresh packed_git if unbundle succeed Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1730 bundle-uri.c | 3 + t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ 2 files changed, 132 insertions(+)diff --git a/bundle-uri.c b/bundle-uri.c index ca32050a78f..2b9d36cfd8e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c@@ -7,6 +7,7 @@ #include "refs.h" #include "run-command.h" #include "hashmap.h" +#include "packfile.h" #include "pkt-line.h" #include "config.h" #include "remote.h"@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) VERIFY_BUNDLE_QUIET))) return 1; + reprepare_packed_git(r); +
Would it make sense to move this to `bundle.c:unbundle()`, since that is also where the idx is created? [snip]
Attachments
- signature.asc [application/pgp-signature] 690 bytes