Re:Re: [PATCH] bundle-uri: refresh packed_git if unbundle succeed
From: Xing Xin <hidden>
Date: 2024-05-20 09:41:32
At 2024-05-17 13:00:49, "Patrick Steinhardt" [off-list ref] wrote:
On Wed, May 15, 2024 at 03:01:09AM +0000, blanet via GitGitGadget wrote:quoted
From: Xing Xin <redacted>Long time no see :)
Glad to see you again here :)
quoted
So why the `refs/bundles/main` is not discovered? After some digging I found that: 1. when unbundling a downloaded bundle, a `verify_bundle` is called to check its prerequisites if any. The verify procedure would find oids so `packed_git` is initialized. 2. after unbundled all bundles, we would enter `do_fetch_pack_v2`, during which `mark_complete_and_common_ref` and `mark_tips` would find oids with `OBJECT_INFO_QUICK` flag set, so no new packs would be enlisted if `packed_git` has already initialized in 1.And I assume we do not want it to not use `OBJECT_INFO_QUICK`?
I think so. For clones or fetches without using bundle-uri, we can hardly encounter the case that new packs are added during the negotiation process. So using `OBJECT_INFO_QUICK` can get some performance gain.
quoted
Back to the example above, when unbunding `incr.bundle`, `base.pack` is enlisted to `packed_git` bacause of the prerequisites to verify. Then we can not find `B` for negotiation at a latter time bacause `B` exists in `incr.pack` which is not enlisted in `packed_git`.Okay, the explanation feels sensible.quoted
This commit fixes this by adding a `reprepare_packed_git` call for every successfully unbundled bundle, which ensures to enlist all generated packs from bundle uri. And a set of negotiation related tests are added.This makes me wonder though. Do we really need to call `reprepare_packed_git()` once for every bundle, or can't we instead call it once at the end once we have fetched all bundles? Reading on.quoted
Signed-off-by: Xing Xin <redacted> --- bundle-uri: refresh packed_git if unbundle succeed Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1730%2Fblanet%2Fxx%2Fbundle-uri-bug-using-bundle-list-v1 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1730/blanet/xx/bundle-uri-bug-using-bundle-list-v1 Pull-Request: https://github.com/gitgitgadget/git/pull/1730 bundle-uri.c | 3 + t/t5558-clone-bundle-uri.sh | 129 ++++++++++++++++++++++++++++++++++++ 2 files changed, 132 insertions(+)diff --git a/bundle-uri.c b/bundle-uri.c index ca32050a78f..2b9d36cfd8e 100644 --- a/bundle-uri.c +++ b/bundle-uri.c@@ -7,6 +7,7 @@ #include "refs.h" #include "run-command.h" #include "hashmap.h" +#include "packfile.h" #include "pkt-line.h" #include "config.h" #include "remote.h"@@ -376,6 +377,8 @@ static int unbundle_from_file(struct repository *r, const char *file) VERIFY_BUNDLE_QUIET))) return 1; + reprepare_packed_git(r); +So what's hidden here is that `unbundle_from_file()` will also try to access the bundle's refs right away. Surprisingly, we do so by calling `refs_update_ref()` with `REF_SKIP_OID_VERIFICATION`, which has the effect that we basically accept arbitrary object IDs here even if we do not know those. That's why we didn't have to `reprepare_packed_git()` before this change.
You are right! I tried dropping this `REF_SKIP_OID_VERIFICATION` flag and the negotiation works as expected. After some further digging I find that without `REF_SKIP_OID_VERIFICATION`, both `write_ref_to_lockfile` for files backend and `reftable_be_transaction_prepare` for reftable backend would call `parse_object` to check the oid. `parse_object` can help refresh `packed_git` via `reprepare_packed_git`.
Now there are two conflicting thoughts here: - Either we can now drop `REF_SKIP_OID_VERIFICATION` as the object IDs should now be accessible. - Or we can avoid calling `reprepare_packed_git()` inside the loop and instead call it once after we have fetched all bundles. The second one feels a bit like premature optimization to me. But the first item does feel like it could help us to catch broken bundles because we wouldn't end up creating refs for objects that neither we nor the bundle have.
I favor the first approach because a validation on the object IDs we are writing is a safe guard . And the flag itself was designed to be used in testing scenarios. /* * Blindly write an object_id. This is useful for testing data corruption * scenarios. */ #define REF_SKIP_OID_VERIFICATION (1 << 10)