Re: [PATCH v4 11/12] refs: implement logic to migrate between ref storage formats
From: Karthik Nayak <hidden>
Date: 2024-06-04 15:28:09
Patrick Steinhardt [off-list ref] writes: [snip]
quoted hunk ↗ jump to hunk
diff --git a/refs.c b/refs.c index 9b112b0527..f7c7765d23 100644 --- a/refs.c +++ b/refs.c@@ -2570,3 +2570,308 @@ int ref_update_check_old_target(const char *referent, struct ref_update *update, referent, update->old_target); return -1; } + +struct migration_data { + struct ref_store *old_refs; + struct ref_transaction *transaction; + struct strbuf *errbuf; +}; + +static int migrate_one_ref(const char *refname, const struct object_id *oid, + int flags, void *cb_data) +{ + struct migration_data *data = cb_data; + struct strbuf symref_target = STRBUF_INIT; + int ret; + + if (flags & REF_ISSYMREF) { + ret = refs_read_symbolic_ref(data->old_refs, refname, &symref_target); + if (ret < 0) + goto done; + + ret = ref_transaction_update(data->transaction, refname, NULL, null_oid(), + symref_target.buf, NULL, + REF_SKIP_CREATE_REFLOG | REF_NO_DEREF, NULL, data->errbuf); + if (ret < 0) + goto done; + } else { + ret = ref_transaction_create(data->transaction, refname, oid, + REF_SKIP_CREATE_REFLOG | REF_SKIP_OID_VERIFICATION, + NULL, data->errbuf); + if (ret < 0) + goto done; + }
I was a little perplexed about the first scenario being `ref_transaction_update` and the second being `ref_transaction_create`, I then realized that this is because the latter doesn't support creating symrefs yet (changes in my series kn/update-ref-symref), makes sense to do it this way. [snip]
+int repo_migrate_ref_storage_format(struct repository *repo,
+ enum ref_storage_format format,
+ unsigned int flags,
+ struct strbuf *errbuf)
+{
+ struct ref_store *old_refs = NULL, *new_refs = NULL;
+ struct ref_transaction *transaction = NULL;
+ struct strbuf buf = STRBUF_INIT;
+ struct migration_data data;
+ size_t reflog_count = 0;
+ char *new_gitdir = NULL;
+ int did_migrate_refs = 0;
+ int ret;
+
+ old_refs = get_main_ref_store(repo);Should we add a check to ensure the `old_refs->repo->ref_storage_format` and `format` are different?
+
+ /*
+ * We do not have any interfaces that would allow us to write many
+ * reflog entries. Once we have them we can remove this restriction.
+ */
+ if (refs_for_each_reflog(old_refs, count_reflogs, &reflog_count) < 0) {
+ strbuf_addstr(errbuf, "cannot count reflogs");
+ ret = -1;
+ goto done;
+ }
+ if (reflog_count) {
+ strbuf_addstr(errbuf, "migrating reflogs is not supported yet");
+ ret = -1;
+ goto done;
+ }Isn't this restrictive? It would be nice to perhaps say "git refs migrate --ignore-reflogs", which could make it possible to not care about reflogs. But maybe that can be part of a follow up.
+ /*
+ * Worktrees complicate the migration because every worktree has a
+ * separate ref storage. While it should be feasible to implement, this
+ * is pushed out to a future iteration.
+ *
+ * TODO: we should really be passing the caller-provided repository to
+ * `has_worktrees()`, but our worktree subsystem doesn't yet support
+ * that.
+ */
+ if (has_worktrees()) {
+ strbuf_addstr(errbuf, "migrating repositories with worktrees is not supported yet");
+ ret = -1;
+ goto done;
+ }
+Same as above.
+ /*
+ * The overall logic looks like this:
+ *
+ * 1. Set up a new temporary directory and initialize it with the new
+ * format. This is where all refs will be migrated into.
+ *
+ * 2. Enumerate all refs and write them into the new ref storage.
+ * This operation is safe as we do not yet modify the main
+ * repository.
+ *
+ * 3. If we're in dry-run mode then we are done and can hand over the
+ * directory to the caller for inspection. If not, we now start
+ * with the destructive part.
+ *
+ * 4. Delete the old ref storage from disk. As we have a copy of refs
+ * in the new ref storage it's okay(ish) if we now get interrupted
+ * as there is an equivalent copy of all refs available.
+ *
+ * 5. Move the new ref storage files into place.
+ *
+ * 6. Change the repository format to the new ref format.
+ */
+ strbuf_addf(&buf, "%s/%s", old_refs->gitdir, "ref_migration.XXXXXX");
+ new_gitdir = mkdtemp(xstrdup(buf.buf));
+ if (!new_gitdir) {
+ strbuf_addf(errbuf, "cannot create migration directory: %s",
+ strerror(errno));
+ ret = -1;
+ goto done;
+ }
+
+ new_refs = ref_store_init(repo, format, new_gitdir,
+ REF_STORE_ALL_CAPS);
+ ret = ref_store_create_on_disk(new_refs, 0, errbuf);
+ if (ret < 0)
+ goto done;
+
+ transaction = ref_store_transaction_begin(new_refs, errbuf);
+ if (!transaction)
+ goto done;
+
+ data.old_refs = old_refs;
+ data.transaction = transaction;
+ data.errbuf = errbuf;
+
+ /*
+ * We need to use the internal `do_for_each_ref()` here so that we can
+ * also include broken refs and symrefs. These would otherwise be
+ * skipped silently.
+ *
+ * Ideally, we would do this call while locking the old ref storage
+ * such that there cannot be any concurrent modifications. We do not
+ * have the infra for that though, and the "files" backend does not
+ * allow for a central lock due to its design. It's thus on the user to
+ * ensure that there are no concurrent writes.
+ */
+ ret = do_for_each_ref(old_refs, "", NULL, migrate_one_ref, 0,
+ DO_FOR_EACH_INCLUDE_ROOT_REFS | DO_FOR_EACH_INCLUDE_BROKEN,
+ &data);
+ if (ret < 0)
+ goto done;
+
+ /*
+ * TODO: we might want to migrate to `initial_ref_transaction_commit()`
+ * here, which is more efficient for the files backend because it would
+ * write new refs into the packed-refs file directly. At this point,
+ * the files backend doesn't handle pseudo-refs and symrefs correctly
+ * though, so this requires some more work.
+ */
+ ret = ref_transaction_commit(transaction, errbuf);
+ if (ret < 0)
+ goto done;
+ did_migrate_refs = 1;
+
+ if (flags & REPO_MIGRATE_REF_STORAGE_FORMAT_DRYRUN) {
+ printf(_("Finished dry-run migration of refs, "
+ "the result can be found at '%s'\n"), new_gitdir);
+ ret = 0;
+ goto done;
+ }
+
+ /*
+ * Until now we were in the non-destructive phase, where we only
+ * populated the new ref store. From hereon though we are about
+ * to get hands by deleting the old ref store and then moving
+ * the new one into place.
+ *
+ * Assuming that there were no concurrent writes, the new ref
+ * store should have all information. So if we fail from hereon
+ * we may be in an in-between state, but it would still be able
+ * to recover by manually moving remaining files from the
+ * temporary migration directory into place.
+ */This also means that the recovery would only be possible into the new format. Makes sense. [snip]
Attachments
- signature.asc [application/pgp-signature] 690 bytes