Thread (22 messages) 22 messages, 5 authors, 2020-12-03

Re: [PATCH 1/6] fixup! reftable: rest of library

From: Johannes Schindelin <hidden>
Date: 2020-12-02 12:34:21

Hi Han-Wen,

On Tue, 1 Dec 2020, Han-Wen Nienhuys wrote:
On Sat, Nov 28, 2020 at 7:44 AM Johannes Schindelin via GitGitGadget
[off-list ref] wrote:
quoted
From: Johannes Schindelin <redacted>

Close the file descriptors to obsolete files before trying to delete or
rename them. This is actually required on Windows.

Note: this patch is just a band-aid to get the tests pass on Windows.
The fact that it is needed raises concerns about the overall resource
handling: are file descriptors closed properly whenever appropriate, or
are they closed much later (which can lead to rename() problems on
Windows, and risks running into ulimits)?

Also, a `reftable_stack_destroy()` call had to be moved in
`test_reftable_stack_uptodate()` to avoid the prompt complaining that a
`.ref` file could not be deleted on Windows. This raises the question
whether the code does the right thing when two concurrent processes want
to access the reftable, and one wants to compact it. At the moment, it
does not appear to fail gracefully.
Thanks for the report; I have to look more closely at these fixes; I
fear they might be incorrect.
They might be incorrect, but less so than the previous state, as testified
by the previously failing PR build.
The reftable spec doesn't treat this case in depth, and I think it was
rather written for Unix-like semantics. In the Unix flavor, a process
that wants to read can keep file descriptors open to keep reading from
the ref DB at a consistent snapshot.
Thanks for the explanation. I actually knew that.
What is the approach that the rest of Git on Windows takes in these
circumstances?
The rest of Git (whether on Windows or not) treats this as a no-no. You
cannot keep a handle open to a file that is deleted.
Consider processes P1 and P2, and the following sequence of actions

P1 opens ref DB (ie. opens a set of *.ref files for read)
P2 opens ref DB, executes a transaction. Post-transaction, it compacts
the reftable stack.
P2 exits
P1 exits

Currently, the compaction in P2 tries to delete the files obviated by
the compaction. On Windows this currently fails, because you can't
delete open files.
Indeed. So the design needs to be fixed, if it fails.
There are several options:

1) P2 should fail the compaction. This is bad because it will lead to
degraded performance over time, and it's not obvious if you can
anticipate that the deletion doesn't work.
2) P2 should retry deleting until it succeeds. This is bad, because a
reader can starve writers.
3) on exit, P1 should check if its *.ref files are still in use, and
delete them. This smells bad, because P1 is a read-only process, yet
it executes writes. Also, do we have good on-exit hooks in Git?
4) On exit, P1 does nothing. Stale *.ref files are left behind. Some
sort of GC process cleans things up asynchronously.
5) The ref DB should not keep files open, and should rather open and
close files as needed; this means P1 doesn't keep files open for long,
and P2 can retry safely.

I think 3) seems the cleanest to me (even though deleting in read
process feels weird), but perhaps we could fallback to 5) on windows
as well.
Traditionally, Git would fail gracefully (i.e. with a warning) to delete
the stale files, and try again at a later stage (during `git gc --auto`,
for example, or after the next compaction step).
What errno code does deleting an in-use file on Windows produce?
I believe it would be `EACCES`. See
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/unlink-wunlink?view=msvc-160
for the documented behavior (I believe that an in-use file is treated the
same way as a read-only file in this instance).

Ciao,
Dscho
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help