--- v2
+++ v3
@@ -20,16 +20,28 @@
1. Lookup the reflog name/OID via dwim_log()
2. With that OID, lock the reflog
- 3. Later in builtin/reflog.c we use the we looked as input to
+ 3. Later in builtin/reflog.c we use the OID we looked as input to
lookup_commit_reference_gently(), assured that it's equal to the
OID we got from dwim_log().
-What do I mean with "mostly" above? It mostly mitigates it because
-we'll still run into cases where the ref is locked and being updated
-as we want to expire it, and other git processes wanting to update the
-refs will in turn race with us as we expire the reflog.
-
-This remaining race can in turn be mitigated with the
+We can be sure that this change is safe to make because between
+dwim_log (step #1) and lock_ref_oid_basic (step #2) there was no other
+logic relevant to the OID or expiry run in the cmd_reflog_expire()
+caller.
+
+We can thus treat that code as a black box, before and after this
+change it would get an OID that's been locked, the only difference is
+that now we mostly won't be failing to get the lock due to the TOCTOU
+race[0]. That failure was purely an implementation detail in how the
+"current OID" was looked up, it was divorced from the locking
+mechanism.
+
+What do we mean with "mostly"? It mostly mitigates it because we'll
+still run into cases where the ref is locked and being updated as we
+want to expire it, and other git processes wanting to update the refs
+will in turn race with us as we expire the reflog.
+
+That remaining race can in turn be mitigated with the
core.filesRefLockTimeout setting, see 4ff0f01cb7 ("refs: retry
acquiring reference locks for 100ms", 2017-08-21). In practice if that
value is high enough we'll probably never have ref updates or reflog
@@ -38,14 +50,16 @@
See [1] for an initial report of how this impacted "git gc" and a
large discussion about this change in early 2019. In particular patch
-looked good to Michael Haggerty his[2], but that seems to not have
-made it to the ML archive, its content is quoted in full in my [3].
+looked good to Michael Haggerty, see his[2]. That message seems to not
+have made it to the ML archive, its content is quoted in full in my
+[3].
I'm leaving behind now-unused code the refs API etc. that takes the
now-NULL "oid" argument, and other code that can be simplified now
that we never have on OID in that context, that'll be cleaned up in
subsequent commits, but for now let's narrowly focus on fixing the
-"git gc" issue
+"git gc" issue. As the modified assert() shows we always pass a NULL
+oid to reflog_expire() now.
Unfortunately this sort of probabilistic contention is hard to turn
into a test. I've tested this by running the following three subshells
@@ -79,7 +93,7 @@
)
Before this change the "reflog expire" would fail really quickly with
-a "but expected" error.
+the "but expected" error noted above.
After this change both the "pull" and "reflog expire" will run for a
while, but eventually fail because I get unlucky with
@@ -102,8 +116,10 @@
With core.filesRefLockTimeout set to 10 seconds (it can probably be a
lot lower) I managed to run all four of these concurrently for about
an hour, and accumulated ~125k commits, auto-gc's and all, and didn't
-have a single failure.
-
+have a single failure. The loops visibly stall while waiting for the
+lock, but that's expected and desired behavior.
+
+0. https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use
1. https://lore.kernel.org/git/87tvg7brlm.fsf@evledraar.gmail.com/
2. http://lore.kernel.org/git/b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu
3. https://lore.kernel.org/git/87pnqkco8v.fsf@evledraar.gmail.com/
@@ -111,8 +127,8 @@
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
builtin/reflog.c | 13 ++++++-------
- refs/files-backend.c | 2 +-
- 2 files changed, 7 insertions(+), 8 deletions(-)
+ refs/files-backend.c | 4 ++--
+ 2 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 9f9e6bceb03..4506852c471 100644
@@ -172,7 +188,7 @@
should_expire_reflog_ent,
reflog_expiry_cleanup,
diff --git a/refs/files-backend.c b/refs/files-backend.c
-index f73637fa087..cb8c64cffb5 100644
+index 92f49cfb7d4..819351c82fc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3060,7 +3060,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
@@ -184,6 +200,15 @@
REF_NO_DEREF,
&type, &err);
if (!lock) {
+@@ -3111,7 +3111,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
+ }
+ }
+
+- assert(oideq(&lock->old_oid, oid));
++ assert(!oid);
+ (*prepare_fn)(refname, &lock->old_oid, cb.policy_cb);
+ refs_for_each_reflog_ent(ref_store, refname, expire_reflog_ent, &cb);
+ (*cleanup_fn)(cb.policy_cb);
--
-2.32.0.873.gb6f2f696497
-
+2.32.0.874.ge7a9d58bfcf
+