Inter-revision diff: patch 8

Comparing v4 (message) to v2 (message)

--- v4
+++ v2
@@ -1,252 +1,40 @@
-During reflog expiry, the cmd_reflog_expire() function first iterates
-over all reflogs in logs/*, and then one-by-one acquires the lock for
-each one and expires it. This behavior has been with us since this
-command was implemented in 4264dc15e1 ("git reflog expire",
-2006-12-19).
+Add a comment about why it is that we need to check for the the
+existence of a reflog we're deleting after we've successfully acquired
+the lock in files_reflog_expire(). As noted in [1] the lock protocol
+for reflogs is somewhat intuitive.
 
-Change this to stop calling lock_ref_oid_basic() with the OID we saw
-when we looped over the logs, instead have it pass the OID it managed
-to lock.
+This early exit code the comment applies to dates all the way back to
+4264dc15e19 (git reflog expire, 2006-12-19).
 
-This mostly mitigates a race condition where e.g. "git gc" will fail
-in a concurrently updated repository because the branch moved since
-"git reflog expire --all" was started. I.e. with:
-
-    error: cannot lock ref '<refname>': ref '<refname>' is at <OID-A> but expected <OID-B>
-
-This behavior of passing in an "oid" was needed for an edge-case that
-I've untangled in this and preceding commits though, namely that we
-needed this OID because we'd:
-
- 1. Lookup the reflog name/OID via dwim_log()
- 2. With that OID, lock the reflog
- 3. Later in builtin/reflog.c we use the OID we looked as input to
-    lookup_commit_reference_gently(), assured that it's equal to the
-    OID we got from dwim_log().
-
-We can be sure that this change is safe to make because between
-dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
-logic relevant to the OID or expiry run in the cmd_reflog_expire()
-caller.
-
-We can thus treat that code as a black box, before and after this
-change it would get an OID that's been locked, the only difference is
-that now we mostly won't be failing to get the lock due to the TOCTOU
-race[0]. That failure was purely an implementation detail in how the
-"current OID" was looked up, it was divorced from the locking
-mechanism.
-
-What do we mean with "mostly"? It mostly mitigates it because we'll
-still run into cases where the ref is locked and being updated as we
-want to expire it, and other git processes wanting to update the refs
-will in turn race with us as we expire the reflog.
-
-That remaining race can in turn be mitigated with the
-core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
-acquiring reference locks for 100ms", 2017-08-21). In practice if that
-value is high enough we'll probably never have ref updates or reflog
-expiry failing, since the clients involved will retry for far longer
-than the time any of those operations could take.
-
-See [1] for an initial report of how this impacted "git gc" and a
-large discussion about this change in early 2019. In particular patch
-looked good to Michael Haggerty, see his[2]. That message seems to not
-have made it to the ML archive, its content is quoted in full in my
-[3].
-
-I'm leaving behind now-unused code the refs API etc. that takes the
-now-NULL "unused_oid" argument, and other code that can be simplified now
-that we never have on OID in that context, that'll be cleaned up in
-subsequent commits, but for now let's narrowly focus on fixing the
-"git gc" issue. As the modified assert() shows we always pass a NULL
-oid to reflog_expire() now.
-
-Unfortunately this sort of probabilistic contention is hard to turn
-into a test. I've tested this by running the following three subshells
-in concurrent terminals:
-
-    (
-        rm -rf /tmp/git &&
-        git init /tmp/git &&
-        while true
-        do
-            head -c 10 /dev/urandom | hexdump >/tmp/git/out &&
-            git -C /tmp/git add out &&
-            git -C /tmp/git commit -m"out"
-        done
-    )
-
-    (
-	rm -rf /tmp/git-clone &&
-        git clone file:///tmp/git /tmp/git-clone &&
-        while git -C /tmp/git-clone pull
-        do
-            date
-        done
-    )
-
-    (
-        while git -C /tmp/git-clone reflog expire --all
-        do
-            date
-        done
-    )
-
-Before this change the "reflog expire" would fail really quickly with
-the "but expected" error noted above.
-
-After this change both the "pull" and "reflog expire" will run for a
-while, but eventually fail because I get unlucky with
-core.filesRefLockTimeout (the "reflog expire" is in a really tight
-loop). As noted above that can in turn be mitigated with higher values
-of core.filesRefLockTimeout than the 100ms default.
-
-As noted in the commentary added in the preceding commit there's also
-the case of branches being racily deleted, that can be tested by
-adding this to the above:
-
-    (
-        while git -C /tmp/git-clone branch topic master &&
-	      git -C /tmp/git-clone branch -D topic
-        do
-            date
-        done
-    )
-
-With core.filesRefLockTimeout set to 10 seconds (it can probably be a
-lot lower) I managed to run all four of these concurrently for about
-an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
-have a single failure. The loops visibly stall while waiting for the
-lock, but that's expected and desired behavior.
-
-0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
-1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
-2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
-3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/
+1. https://lore.kernel.org/git/54DCDA42.2060800@alum.mit.edu/
 
 Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
 ---
- builtin/reflog.c     | 13 ++++++-------
- refs.h               |  2 +-
- refs/files-backend.c |  7 +++++--
- 3 files changed, 12 insertions(+), 10 deletions(-)
+ refs/files-backend.c | 11 +++++++++++
+ 1 file changed, 11 insertions(+)
 
-diff --git a/builtin/reflog.c b/builtin/reflog.c
-index 09541d1c80..61795f22d5 100644
---- a/builtin/reflog.c
-+++ b/builtin/reflog.c
-@@ -629,8 +629,9 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
- 		free_worktrees(worktrees);
- 		for (i = 0; i < collected.nr; i++) {
- 			struct collected_reflog *e = collected.e[i];
-+
- 			set_reflog_expiry_param(&cb.cmd, explicit_expiry, e->reflog);
--			status |= reflog_expire(e->reflog, &e->oid, flags,
-+			status |= reflog_expire(e->reflog, NULL, flags,
- 						reflog_expiry_prepare,
- 						should_expire_reflog_ent,
- 						reflog_expiry_cleanup,
-@@ -642,13 +643,12 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
- 
- 	for (; i < argc; i++) {
- 		char *ref;
--		struct object_id oid;
--		if (!dwim_log(argv[i], strlen(argv[i]), &oid, &ref)) {
-+		if (!dwim_log(argv[i], strlen(argv[i]), NULL, &ref)) {
- 			status |= error(_("%s points nowhere!"), argv[i]);
- 			continue;
- 		}
- 		set_reflog_expiry_param(&cb.cmd, explicit_expiry, ref);
--		status |= reflog_expire(ref, &oid, flags,
-+		status |= reflog_expire(ref, NULL, flags,
- 					reflog_expiry_prepare,
- 					should_expire_reflog_ent,
- 					reflog_expiry_cleanup,
-@@ -700,7 +700,6 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
- 
- 	for ( ; i < argc; i++) {
- 		const char *spec = strstr(argv[i], "@{");
--		struct object_id oid;
- 		char *ep, *ref;
- 		int recno;
- 
-@@ -709,7 +708,7 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
- 			continue;
- 		}
- 
--		if (!dwim_log(argv[i], spec - argv[i], &oid, &ref)) {
-+		if (!dwim_log(argv[i], spec - argv[i], NULL, &ref)) {
- 			status |= error(_("no reflog for '%s'"), argv[i]);
- 			continue;
- 		}
-@@ -724,7 +723,7 @@ static int cmd_reflog_delete(int argc, const char **argv, const char *prefix)
- 			cb.cmd.expire_total = 0;
- 		}
- 
--		status |= reflog_expire(ref, &oid, flags,
-+		status |= reflog_expire(ref, NULL, flags,
- 					reflog_expiry_prepare,
- 					should_expire_reflog_ent,
- 					reflog_expiry_cleanup,
-diff --git a/refs.h b/refs.h
-index 48970dfc7e..ddbf15f1c2 100644
---- a/refs.h
-+++ b/refs.h
-@@ -796,7 +796,7 @@ enum expire_reflog_flags {
-  * expiration policy that is desired.
-  *
-  * reflog_expiry_prepare_fn -- Called once after the reference is
-- *     locked.
-+ *     locked. Called with the OID of the locked reference.
-  *
-  * reflog_expiry_should_prune_fn -- Called once for each entry in the
-  *     existing reflog. It should return true iff that entry should be
 diff --git a/refs/files-backend.c b/refs/files-backend.c
-index 5415306416..ccdf455049 100644
+index ec9c70d79cc..f73637fa087 100644
 --- a/refs/files-backend.c
 +++ b/refs/files-backend.c
-@@ -3032,7 +3032,7 @@ static int expire_reflog_ent(struct object_id *ooid, struct object_id *noid,
- }
- 
- static int files_reflog_expire(struct ref_store *ref_store,
--			       const char *refname, const struct object_id *oid,
-+			       const char *refname, const struct object_id *unused_oid,
- 			       unsigned int flags,
- 			       reflog_expiry_prepare_fn prepare_fn,
- 			       reflog_expiry_should_prune_fn should_prune_fn,
-@@ -3049,6 +3049,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- 	int status = 0;
- 	int type;
- 	struct strbuf err = STRBUF_INIT;
-+	const struct object_id *oid;
- 
- 	memset(&cb, 0, sizeof(cb));
- 	cb.flags = flags;
-@@ -3060,7 +3061,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- 	 * reference itself, plus we might need to update the
- 	 * reference if --updateref was specified:
- 	 */
--	lock = lock_ref_oid_basic(refs, refname, oid,
-+	lock = lock_ref_oid_basic(refs, refname, NULL,
- 				  REF_NO_DEREF,
- 				  &type, &err);
- 	if (!lock) {
-@@ -3068,6 +3069,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
+@@ -3068,6 +3068,17 @@ static int files_reflog_expire(struct ref_store *ref_store,
  		strbuf_release(&err);
  		return -1;
  	}
-+	oid = &lock->old_oid;
- 
- 	/*
- 	 * When refs are deleted, their reflog is deleted before the
-@@ -3111,6 +3113,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
- 		}
- 	}
- 
-+	assert(!unused_oid);
- 	(*prepare_fn)(refname, oid, cb.policy_cb);
- 	refs_for_each_reflog_ent(ref_store, refname, expire_reflog_ent, &cb);
- 	(*cleanup_fn)(cb.policy_cb);
++
++	/*
++	 * When refs are deleted their reflog is deleted before the
++	 * ref itself deleted. This race happens because there's no
++	 * such thing as a lock on the reflog, instead we always lock
++	 * the "loose ref" (even if packet) above with
++	 * lock_ref_oid_basic().
++	 *
++	 * If race happens we've got nothing more to do, we were asked
++	 * to delete the reflog, and it's not there anymore. Great!
++	 */
+ 	if (!refs_reflog_exists(ref_store, refname)) {
+ 		unlock_ref(lock);
+ 		return 0;
 -- 
-2.32.0.956.g6b0c84ceda8
+2.32.0.873.gb6f2f696497
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help