Re: Bug report: v2.47.0 cannot fetch version 1 pack indexes
From: Jeff King <hidden>
Date: 2024-10-22 05:14:03
Subsystem:
the rest · Maintainer:
Linus Torvalds
On Mon, Oct 21, 2024 at 04:33:15PM -0400, Taylor Blau wrote:
quoted
@@ -2388,8 +2389,24 @@ static char *fetch_pack_index(unsigned char *hash, const char *base_url) strbuf_addf(&buf, "objects/pack/pack-%s.idx", hash_to_hex(hash)); url = strbuf_detach(&buf, NULL); - strbuf_addf(&buf, "%s.temp", sha1_pack_index_name(hash)); - tmp = strbuf_detach(&buf, NULL); + /* + * Don't put this into packs/, since it's just temporary and we don't + * want to confuse it with our local .idx files. We'll generate our + * own index if we choose to download the matching packfile. + * + * It's tempting to use xmks_tempfile() here, but it's important that + * the file not exist, otherwise http_get_file() complains. So we + * create a filename that should be unique, and then just register it + * as a tempfile so that it will get cleaned up on exit. + * + * Arguably it would be better to hold on to the tempfile handle so + * that we can remove it as soon as we download the pack and generate + * the real index, but that might need more surgery. + */ + tmp = xstrfmt("%s/tmp_pack_%s.idx", + repo_get_object_directory(the_repository), + hash_to_hex(hash)); + register_tempfile(tmp);Makes perfect sense, and the comment above here is much appreciated. I thought about trying to use some intermediate state of the strbuf here to avoid an extra xstrfmt() call, but couldn't come up with anything I didn't think was awkward.
I don't think there's any useful intermediate state. The earlier %s is the base url, but here it's our local directory. We could continue to re-use the scratch strbuf as the existing code did (and which xstrfmt() is doing under the hood). It wasn't really intentional for me to change that, but I went through a lot of attempts to get here (using mks_tempfile(), and so on).
quoted
+static char *pack_path_from_idx(const char *idx_path) +{ + size_t len; + if (!strip_suffix(idx_path, ".idx", &len)) + BUG("idx path does not end in .idx: %s", idx_path); + return xstrfmt("%.*s.pack", (int)len, idx_path); +} + struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path) { - const char *path = sha1_pack_name(sha1); + char *path = pack_path_from_idx(idx_path);Huh. I would have thought we have such a helper function already. I guess we probably do, but that it's also defined statically because it's so easy to write.
I thought so, too, but couldn't find one. We have pack_bitmap_filename() (and so on for .rev and .midx files) that goes from .pack to those extensions. But here we want to go from .idx to .pack. I think most stuff goes from ".pack" because that's what we store in the packed_git struct. There's also sha1_pack_index_name(), but that goes from a csum-file hash to a filename. I grepped around and strip_suffix() seems to be par for the course in similar situations within pack/repack code, so I think it's OK here.
In any case, this looks like the right thing to do here. It would be nice to have a corresponding test here, since unlike the other finalize_object_file() changes, this one can be provoked deterministically. Would you mind submitting this as a bona-fide patch, which I can then pick up and start merging down?
Yeah, the test is easy:
diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 58189c9f7d..50a7b98813 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh@@ -507,4 +507,14 @@ test_expect_success 'fetching via http alternates works' ' git -c http.followredirects=true clone "$HTTPD_URL/dumb/alt-child.git" ' +test_expect_success 'dumb http can fetch index v1' ' + server=$HTTPD_DOCUMENT_ROOT_PATH/idx-v1.git && + git init --bare "$server" && + git -C "$server" --work-tree=. commit --allow-empty -m foo && + git -C "$server" -c pack.indexVersion=1 gc && + + git clone "$HTTPD_URL/dumb/idx-v1.git" && + git -C idx-v1 fsck +' + test_done
I raised some other more philosophical issues in the other part of the thread, but assuming the answer is "no, let's do the simplest thing", then I think this approach is OK. I'd also like to see if I can clean things up around parse_pack_index(), whose semantics I'm changing here (and which violates all manner of assumptions that we usually have about packed_git structs). It's used only by the dumb-http code, and I think we want to refactor it a bit so that nobody else is tempted to use it. I'll try to send something out tonight or tomorrow. -Peff