Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
From: Xing Xin <hidden>
Date: 2024-05-20 10:20:06
At 2024-05-17 15:36:53, "Karthik Nayak" [off-list ref] wrote:
"blanet via GitGitGadget" [off-list ref] writes:quoted
From: Xing Xin <redacted>> When using the bundle-uri mechanism with a bundle list containing multiple interrelated bundles, we encountered a bug where tips from downloaded bundles were not being discovered, resulting in rather slow clones. This was particularly problematic when employing the heuristic `creationTokens`. And this is easy to reproduce. Suppose we have a repository with a single branch `main` pointing to commit `A`, firstly we create a base bundle with git bundle create base.bundle main Then let's add a new commit `B` on top of `A`, so that an incremental bundle for `main` can be created with git bundle create incr.bundle A..main Now we can generate a bundle list with the following content: [bundle] version = 1 mode = all heuristic = creationToken [bundle "base"] uri = base.bundle creationToken = 1 [bundle "incr"] uri = incr.bundle creationToken = 2 A fresh clone with the bundle list above would give the expected `refs/bundles/main` pointing at `B` in new repository, in other words we already had everything locally from the bundles, but git would still download everything from server as if we got nothing. So why the `refs/bundles/main` is not discovered? After some digging I found that: 1. when unbundling a downloaded bundle, a `verify_bundle` is called tos/a//
Thanks!
quoted
check its prerequisites if any. The verify procedure would find oids so `packed_git` is initialized.So the flow is: 1. `fetch_bundle_list` fetches all the bundles advertised via `download_bundle_list` to local files. 2. It then calls `unbundle_all_bundles` to unbundle all the bundles. 3. Each bundle is then unbundled using `unbundle_from_file`. 4. Here, we first read the bundle header to get all the prerequisites for the bundle, this is done in `read_bundle_header`. 5. Then we call `unbundle`, which calls `verify_bundle` to ensure that the repository does indeed contain the prerequisites mentioned in the bundle. Then it creates the index from the bundle file. So because the objects are being checked, the `prepare_packed_git` function is eventually called, which means that the `raw_object_store->packed_git` data gets filled in and `packed_git_initialized` is set. This means consecutive calls to `prepare_packed_git` doesn't re-initiate `raw_object_store->packed_git` since `packed_git_initialized` already is set. So your explanation makes sense, as a _nit_ I would perhaps add the part about why consecutive calls to `prepare_packed_git` are ineffective.
Thanks my friend, you have expressed this issue more clearly. I will post a new description based on your explanation with the creationToken case covered.
quoted
2. after unbundled all bundles, we would enter `do_fetch_pack_v2`,s/unbundled/unbundling
Copy that.
quoted
+#include "packfile.h" #include "pkt-line.h" #include "config.h" #include "remote.h"@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) VERIFY_BUNDLE_QUIET))) return 1; + reprepare_packed_git(r); +Would it make sense to move this to `bundle.c:unbundle()`, since that is also where the idx is created?
I wonder if we need a mental model that we should `reprepare_packed_git` that when a new pack and its corresponding idx is generated? Currently whether to call `reprepare_packed_git` is determined by the caller. But within the scope of this bug, I tend to remove the `REF_SKIP_OID_VERIFICATION` flag when writing refs as Patrick suggested. Xing Xin