--- v4
+++ v2
@@ -1,252 +1,40 @@
-During reflog expiry, the cmd_reflog_expire() function first iterates
-over all reflogs in logs/*, and then one-by-one acquires the lock for
-each one and expires it. This behavior has been with us since this
-command was implemented in 4264dc15e1 ("git reflog expire",
-2006-12-19).
+Add a comment about why it is that we need to check for the the
+existence of a reflog we're deleting after we've successfully acquired
+the lock in files_reflog_expire(). As noted in [1] the lock protocol
+for reflogs is somewhat intuitive.
-Change this to stop calling lock_ref_oid_basic() with the OID we saw
-when we looped over the logs, instead have it pass the OID it managed
-to lock.
+This early exit code the comment applies to dates all the way back to
+4264dc15e19 (git reflog expire, 2006-12-19).
-This mostly mitigates a race condition where e.g. "git gc" will fail
-in a concurrently updated repository because the branch moved since
-"git reflog expire --all" was started. I.e. with:
-
- error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>
-
-This behavior of passing in an "oid" was needed for an edge-case that
-I've untangled in this and preceding commits though, namely that we
-needed this OID because we'd:
-
- 1. Lookup the reflog name/OID via dwim_log()
- 2. With that OID, lock the reflog
- 3. Later in builtin/reflog.c we use the OID we looked as input to
- lookup_commit_reference_gently(), assured that it's equal to the
- OID we got from dwim_log().
-
-We can be sure that this change is safe to make because between
-dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
-logic relevant to the OID or expiry run in the cmd_reflog_expire()
-caller.
-
-We can thus treat that code as a black box, before and after this
-change it would get an OID that's been locked, the only difference is
-that now we mostly won't be failing to get the lock due to the TOCTOU
-race[0]. That failure was purely an implementation detail in how the
-"current OID" was looked up, it was divorced from the locking
-mechanism.
-
-What do we mean with "mostly"? It mostly mitigates it because we'll
-still run into cases where the ref is locked and being updated as we
-want to expire it, and other git processes wanting to update the refs
-will in turn race with us as we expire the reflog.
-
-That remaining race can in turn be mitigated with the
-core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
-acquiring reference locks for 100ms", 2017-08-21). In practice if that
-value is high enough we'll probably never have ref updates or reflog
-expiry failing, since the clients involved will retry for far longer
-than the time any of those operations could take.
-
-See [1] for an initial report of how this impacted "git gc" and a
-large discussion about this change in early 2019. In particular patch
-looked good to Michael Haggerty, see his[2]. That message seems to not
-have made it to the ML archive, its content is quoted in full in my
-[3].
-
-I'm leaving behind now-unused code the refs API etc. that takes the
-now-NULL "unused_oid" argument, and other code that can be simplified now
-that we never have on OID in that context, that'll be cleaned up in
-subsequent commits, but for now let's narrowly focus on fixing the
-"git gc" issue. As the modified assert() shows we always pass a NULL
-oid to reflog_expire() now.
-
-Unfortunately this sort of probabilistic contention is hard to turn
-into a test. I've tested this by running the following three subshells
-in concurrent terminals:
-
- (
- rm -rf /tmp/git &&
- git init /tmp/git &&
- while true
- do
- head -c 10 /dev/urandom | hexdump >/tmp/git/out &&
- git -C /tmp/git add out &&
- git -C /tmp/git commit -m"out"
- done
- )
-
- (
- rm -rf /tmp/git-clone &&
- git clone file:///tmp/git /tmp/git-clone &&
- while git -C /tmp/git-clone pull
- do
- date
- done
- )
-
- (
- while git -C /tmp/git-clone reflog expire --all
- do
- date
- done
- )
-
-Before this change the "reflog expire" would fail really quickly with
-the "but expected" error noted above.
-
-After this change both the "pull" and "reflog expire" will run for a
-while, but eventually fail because I get unlucky with
-core.filesRefLockTimeout (the "reflog expire" is in a really tight
-loop). As noted above that can in turn be mitigated with higher values
-of core.filesRefLockTimeout than the 100ms default.
-
-As noted in the commentary added in the preceding commit there's also
-the case of branches being racily deleted, that can be tested by
-adding this to the above:
-
- (
- while git -C /tmp/git-clone branch topic master &&
- git -C /tmp/git-clone branch -D topic
- do
- date
- done
- )
-
-With core.filesRefLockTimeout set to 10 seconds (it can probably be a
-lot lower) I managed to run all four of these concurrently for about
-an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
-have a single failure. The loops visibly stall while waiting for the
-lock, but that's expected and desired behavior.
-
-0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
-1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
-2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
-3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/
+1. https://lore.kernel.org/git/54DCDA42.2060800@alum.mit.edu/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
- builtin/reflog.c | 13 ++++++-------
- refs.h | 2 +-
- refs/files-backend.c | 7 +++++--
- 3 files changed, 12 insertions(+), 10 deletions(-)
+ refs/files-backend.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
-diff --git a/builtin/reflog.c b/builtin/reflog.c
-index 09541d1c80..61795f22d5 100644
---- a/builtin/reflog.c
-+++ b/builtin/reflog.c
-@@ -629,8 +629,9 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
- free_worktrees(worktrees);
- for (i = 0; i < collected.nr; i++) {
- struct collected_reflog *e = collected.e[i];
-+
- set_reflog_expiry_param(&cb.cmd, explicit_expiry, e->reflog);
-- status |= reflog_expire(e->reflog, &e->oid, flags,
-+ status |= reflog_expire(e->reflog, NULL, flags,
- reflog_expiry_prepare,
- should_expire_reflog_ent,
- reflog_expiry_cleanup,
-@@ -642,13 +643,12 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
-
- for (; i < argc; i++) {
- char *ref;
-- struct object_id oid;
-- if (!dwim_log(argv[i], strlen(argv[i]), &oid, &ref)) {
-+ if (!dwim_log(argv[i], strlen(argv[i]), NULL, &ref)) {
- status |= error(_("%s points nowhere!"), argv[i]);
- continue;
- }
- set_reflog_expiry_param(&cb.cmd, explicit_expiry, ref);
-- status |= reflog_expire(ref, &oid, flags,
-+ status |= reflog_expire(ref, NULL, flags,
- reflog_expiry_prepare,
- should_expire_reflog_ent,
- reflog_expiry_cleanup,
-@@ -700,7 +700,6 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
-
- for ( ; i < argc; i++) {
- const char *spec = strstr(argv[i], "@{");
-- struct object_id oid;
- char *ep, *ref;
- int recno;
-
-@@ -709,7 +708,7 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
- continue;
- }
-
-- if (!dwim_log(argv[i], spec - argv[i], &oid, &ref)) {
-+ if (!dwim_log(argv[i], spec - argv[i], NULL, &ref)) {
- status |= error(_("no reflog for '%s'"), argv[i]);
- continue;
- }
-@@ -724,7 +723,7 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
- cb.cmd.expire_total = 0;
- }
-
-- status |= reflog_expire(ref, &oid, flags,
-+ status |= reflog_expire(ref, NULL, flags,
- reflog_expiry_prepare,
- should_expire_reflog_ent,
- reflog_expiry_cleanup,
-diff --git a/refs.h b/refs.h
-index 48970dfc7e..ddbf15f1c2 100644
---- a/refs.h
-+++ b/refs.h
-@@ -796,7 +796,7 @@ enum expire_reflog_flags {
- * expiration policy that is desired.
- *
- * reflog_expiry_prepare_fn -- Called once after the reference is
-- * locked.
-+ * locked. Called with the OID of the locked reference.
- *
- * reflog_expiry_should_prune_fn -- Called once for each entry in the
- * existing reflog. It should return true iff that entry should be
diff --git a/refs/files-backend.c b/refs/files-backend.c
-index 5415306416..ccdf455049 100644
+index ec9c70d79cc..f73637fa087 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
-@@ -3032,7 +3032,7 @@ static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
- }
-
- static int files_reflog_expire(struct ref_store *ref_store,
-- const char *refname, const struct object_id *oid,
-+ const char *refname, const struct object_id *unused_oid,
- unsigned int flags,
- reflog_expiry_prepare_fn prepare_fn,
- reflog_expiry_should_prune_fn should_prune_fn,
-@@ -3049,6 +3049,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- int status = 0;
- int type;
- struct strbuf err = STRBUF_INIT;
-+ const struct object_id *oid;
-
- memset(&cb, 0, sizeof(cb));
- cb.flags = flags;
-@@ -3060,7 +3061,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- * reference itself, plus we might need to update the
- * reference if --updateref was specified:
- */
-- lock = lock_ref_oid_basic(refs, refname, oid,
-+ lock = lock_ref_oid_basic(refs, refname, NULL,
- REF_NO_DEREF,
- &type, &err);
- if (!lock) {
-@@ -3068,6 +3069,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
+@@ -3068,6 +3068,17 @@ static int files_reflog_expire(struct ref_store *ref_store,
strbuf_release(&err);
return -1;
}
-+ oid = &lock->old_oid;
-
- /*
- * When refs are deleted, their reflog is deleted before the
-@@ -3111,6 +3113,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- }
- }
-
-+ assert(!unused_oid);
- (*prepare_fn)(refname, oid, cb.policy_cb);
- refs_for_each_reflog_ent(ref_store, refname, expire_reflog_ent, &cb);
- (*cleanup_fn)(cb.policy_cb);
++
++ /*
++ * When refs are deleted their reflog is deleted before the
++ * ref itself deleted. This race happens because there's no
++ * such thing as a lock on the reflog, instead we always lock
++ * the "loose ref" (even if packet) above with
++ * lock_ref_oid_basic().
++ *
++ * If race happens we've got nothing more to do, we were asked
++ * to delete the reflog, and it's not there anymore. Great!
++ */
+ if (!refs_reflog_exists(ref_store, refname)) {
+ unlock_ref(lock);
+ return 0;
--
-2.32.0.956.g6b0c84ceda8
+2.32.0.873.gb6f2f696497