Re: Concurrent fetch commands

From: Patrick Steinhardt <hidden>
Date: 2024-01-03 08:11:12

On Wed, Jan 03, 2024 at 08:33:24AM +0100, Patrick Steinhardt wrote:

On Sun, Dec 31, 2023 at 09:27:19AM -0800, Junio C Hamano wrote:

quoted

Stefan Haller [off-list ref] writes:

quoted

I can reliably reproduce this by doing

   $ git fetch&; sleep 0.1; git pull
   [1] 42160
   [1]  + done       git fetch
   fatal: Cannot rebase onto multiple branches.

I see a bug here.

How this _ought_ to work is

 - The first "git fetch" wants to report what it fetched by writing
   into the $GIT_DIR/FETCH_HEAD file ("git merge FETCH_HEAD" after
   the fetch finishes can consume its contents).

 - The second "git pull" runs "git fetch" under the hood.  Because
   it also wants to write to $GIT_DIR/FETCH_HEAD, and because there
   is already somebody writing to the file, it should notice and
   barf, saying "fatal: a 'git fetch' is already working" or
   something.

But because there is no "Do not overwrite FETCH_HEAD somebody else
is using" protection, "git merge" or "git rebase" that is run as the
second half of the "git pull" ends up working on the contents of
FETCH_HEAD that is undefined, and GIGO result follows.

The "bug" that the second "git fetch" does not notice an already
running one (who is in possession of FETCH_HEAD) and refrain from
starting is not easy to design a fix for---we cannot just abort by
opening it with O_CREAT|O_EXCL because it is a normal thing for
$GIT_DIR/FETCH_HEAD to exist after the "last" fetch.  We truncate
its contents before starting to avoid getting affected by contents
leftover by the last fetch, but when there is a "git fetch" that is
actively running, and it finishes _after_ the second one starts and
truncates the file, the second one will end up seeing the contents
the first one left.  We have the "--no-write-fetch-head" option for
users to explicitly tell which invocation of "git fetch" should not
write FETCH_HEAD.

While I agree that it's the right thing to use "--no-write-fetch-head"
in this context, I still wonder whether we want to fix this "bug". It
would be a rather easy change on our side to start using the lockfile
API to write to FETCH_HEAD, which has a bunch of benefits:

  - We would block concurrent processes of writing to FETCH_HEAD at the
    same time (well, at least for clients aware of the new semantics).

  - Consequentially, we do not write a corrupted FETCH_HEAD anymore when
    multiple processes write to it at the same time.

  - We're also more robust against corruption in the context of hard
    crashes due to atomic rename semantics and proper flushing.

I don't really see much of a downside except for the fact that we change
how this special ref is being written to, so other implementations would
need to adapt accordingly. But even if they didn't, if clients with both
the new and old behaviour write FETCH_HEAD at the same point in time the
result would still be a consistent FETCH_HEAD if both writes finish. We
do have a race now which of both versions of FETCH_HEAD we see, but that
feels better than corrupted contents to me.

Ah, one thing I didn't think of is parallel fetches. It's expected that
all of the fetches write into FETCH_HEAD at the same point in time
concurrently, so the end result is the collection of updated refs even
though the order isn't guaranteed. That makes things a bit more
complicated indeed.

Patrick

Attachments

signature.asc [application/pgp-signature] 833 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help