Thread (12 messages) 12 messages, 2 authors, 2025-03-31

Re: [PATCH 2/2] compat/mingw: fix EACCESS when opening files with `O_CREAT | O_EXCL`

From: Johannes Schindelin <hidden>
Date: 2025-03-16 00:01:27

Hi Patrick,

On Thu, 13 Mar 2025, Patrick Steinhardt wrote:
In our CI systems we can observe that t0610 fails rather frequently.
This testcase races a bunch of git-update-ref(1) processes with one
another which are all trying to update a unique reference, where we
expect that all processes succeed and end up updating the reftable
stack. The error message in this case looks like the following:

    fatal: update_ref failed for ref 'refs/heads/branch-88': reftable: transaction prepare: I/O error
I saw this error plenty of times and was wondering whether there would be
a way to get more useful information in the error message.

After all, I/O errors come in all shapes and forms, and telling the user
that _something_ was wrong but forcing them to recreate the issue in a GDB
session is an excellent recipe to cause frustration.

So I'd like to suggest to improve the user experience substantially by
augmenting the rather generic `I/O error` with details as to what
operation failed, with what exact error, on what file.
Instrumenting the code with a couple of calls to `BUG()` in relevant
sites where we return `REFTABLE_IO_ERROR` quickly leads one to discover
that this error is caused when calling `flock_acquire()`, which is a
thin wrapper around our lockfile API. Curiously, the error code we get
in such cases is `EACCESS`, indicating that we are not allowed to access
the file.

The root cause of this is an oddity of `CreateFileW()`, which is what
`_wopen()` uses internally. Quoting its documentation [1]:

    If you call CreateFile on a file that is pending deletion as a
    result of a previous call to DeleteFile, the function fails. The
    operating system delays file deletion until all handles to the file
    are closed. GetLastError returns ERROR_ACCESS_DENIED.

This behaviour is triggered quite often in the above testcase because
all the processes race with one another trying to acquire the lock for
the "tables.list" file. This is due to how locking works in the reftable
library when compacting a stack:

    1. Lock the "tables.list" file and reads its contents.

    2. Decide which tables to compact.

    3. Lock each of the individual tables that we are about to compact.

    4. Unlock the "tables.list" file.

    5. Compact the individual tables into one large table.

    6. Re-lock the "tables.list" file.

    7. Write the new list of tables into it.

    8. Commit the "tables.list" file.

The important step is (4): we don't commit the file directly by renaming
it into place, but instead we delete the lockfile so that concurrent
processes can continue to append to the reftable stack while we compact
the tables. And because we use `DeleteFileW()` to do so, we may now race
with another process that wants to acquire that lockfile. So if we are
unlucky, we would now see `ERROR_ACCESS_DENIED` instead of the expected
`ERROR_FILE_EXISTS`, which the lockfile subsystem isn't prepared to
handle and thus it will bail out without retrying to acquire the lock.

In theory, the issue is not limited to the reftable library and can be
triggered by every other user of the lockfile subsystem, as well. My gut
feeling tells me it's rather unlikely to surface elsewhere though.

Fix the issue by translating the error to `EEXIST`. This makes the
lockfile subsystem handle the error correctly: in case a timeout is set
it will now retry acquiring the lockfile until the timeout has expired.

With this, t0610 is now always passing on my machine whereas it was
previously failing in around 20-30% of all test runs.
It is good that you fixed this issue!

However, `ERROR_ACCESS_DENIED` most often means one of two things:

- The file in question exists but is opened exclusively by another process
  (which might be Defender, the anti-malware scanner), or

- The current user lacks the permission to create this particular file,
  i.e. it is really what `EACCES` would mean on Linux.

While the first condition clearly can be interpreted as "file exists" in
the way this patch wants to do, the latter cannot be. And the patch
touches a function that is exclusively used by the `lockfile` machinery,
each and every caller of `open(..., ... O_CREAT)` is affected by this
change.

This has ramifications e.g. when running in a worktree where the user has
no write permission (but which they indicated as safe via
`safe.directory`). Git would then no longer report correctly whe it cannot
write files because the user lacks permission to do that, but would
instead claim that the file already exists, when that is not true.

Maybe there is a place higher in the stack trace where Git could instead
learn to handle `EACCES`? E.g. treat it the same as `EEXIST`, or maybe
alternatively make it Windows-specific and introduce a back-off plan?

Ciao,
Johannes
quoted hunk ↗ jump to hunk
[1]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-createfilew

Signed-off-by: Patrick Steinhardt <redacted>
---
 compat/mingw.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)
diff --git a/compat/mingw.c b/compat/mingw.c
index 101e380c5a3..fb61de759c7 100644
--- a/compat/mingw.c
+++ b/compat/mingw.c
@@ -644,6 +644,19 @@ int mingw_open (const char *filename, int oflags, ...)

 	fd = open_fn(wfilename, oflags, mode);

+	/*
+	 * Internally, `_wopen()` uses the `CreateFile()` API with CREATE_NEW,
+	 * which may error out with ERROR_ACCESS_DENIED when the file is
+	 * scheduled for deletion via `DeleteFileW()`. The file essentially
+	 * exists, so we map this error to ERROR_ALREADY_EXISTS so that callers
+	 * don't have to special-case this.
+	 *
+	 * This fixes issues for example with the lockfile interface when one
+	 * process has a lock that it is about to commit or release while
+	 * another process wants to acquire it.
+	 */
+	if (fd < 0 && create && GetLastError() == ERROR_ACCESS_DENIED)
+		errno = EEXIST;
 	if (fd < 0 && (oflags & O_ACCMODE) != O_RDONLY && errno == EACCES) {
 		DWORD attrs = GetFileAttributesW(wfilename);
 		if (attrs != INVALID_FILE_ATTRIBUTES && (attrs & FILE_ATTRIBUTE_DIRECTORY))

--
2.49.0.rc2.394.gf6994c5077.dirty

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help