Thread (4 messages) 4 messages, 2 authors, 2026-01-20

Re: [PATCH 2/2] t0610-reftable-basics: mitigate a flaky test on cygwin

From: Ramsay Jones <hidden>
Date: 2026-01-19 17:13:57


On 19/01/2026 6:50 am, Patrick Steinhardt wrote:
On Fri, Jan 16, 2026 at 08:39:56PM +0000, Ramsay Jones wrote:
quoted
Test #29 ('ref transaction: corrupted tables cause failure') started to
fail intermittently for me (from v2.52.0-rc0) when running the testsuite
with '-j8'. (Also, having moved to a new laptop and windows 11, rather
than windows 10). If the test is run by hand, or without any parallelism,
then it passes without issue.

When the test fails (e.g. 1 out of 32 parallel runs) the cause is due to
a permission error while corrupting a table file:

  ./test-lib.sh: line 1010: .git/reftable/0x000000000001-0x000000000002-d89bb8ee.ref: Permission denied
This rings a bell. I remember that we discussed a case at some point in
time where a redirect converted to `test-tool truncate` fixed a flake on
Cygwin.
Indeed, the mail thread starts at:

  https://lore.kernel.org/git/f22c95ad-43c8-41de-8315-e707224e830b@ramsayjones.plus.com/ (local)
quoted
This corruption is done in a shell loop, directly after a 'test_commit',
which uses an ': >"$f"' expression to truncate the file. Adding a sleep
of one second after the 'test_commit' and before the shell loop fixes
the test (it is not clear why). Replacing the redirection shell expression
with a 'test-tool truncate "$f" 0' invocation also provides a fix, which
could simply be another way to change the timing sufficiently to win the
race.

During a debug session, I tried looking at the strace output for the
shell redirection:

  $ rm /tmp/hello; echo hello >/tmp/hello; ls -l /tmp/hello
  -rw-r--r-- 1 ramsay None 6 Nov 10 17:25 /tmp/hello
  $

  $ strace -o zzz bash -c ': >/tmp/hello'
  $

Similarly, for the test-tool solution:

  $ strace -o xxx ./t/helper/test-tool truncate /tmp/hello 0
  $

When comparing the output, the differences seemed to be what you would
expect and, if anything, the shell redirect probably would have taken
longer than the test-tool solution (many fcntl() calls to dup the stdout
to the <fd>).  The call to the win32 api NtCreateFile() was identical,
apart from the first (FileHandle) parameter, of course.
Too bad. I stil wonder whether it is the extra process that we spawn
that ends up fixing the issue.
Well, a 'sleep 1' before the shell loop also fixes the issue. I hate to
mention the 'windows delays updating some file attributes until after the
process has exited' conspiracy theory, but ... :) (yeah, I just don't
think that is possible, except ...)
quoted
In order to fix this flaky test on cygwin, despite not knowing why it
works, replace the shell redirection with the above 'test-tool truncate'
invocation.

Helped-by: Patrick Steinhardt [off-list ref]
Oh, so is this the exact case that we were talking about? If so, it
might make sense to link to the mail thread so that folks can also read
a bit into our discussion around this.
Indeed! I thought about referencing the email thread, but I decided that it
didn't really offer any more supporting evidence than the commit message
(in fact less - it doesn't mention the 'strace' scan).

I can add that (again [1]), if you think it's worth it, but I just re-read
the email thread and I'm not convinced it offers much extra value. So, I would
rather not re-roll, but I will if you think it worth it. Let me know.

[1] https://lore.kernel.org/git/f22c95ad-43c8-41de-8315-e707224e830b@ramsayjones.plus.com/ (local)
quoted
Signed-off-by: Ramsay Jones <redacted>
---
 t/t0610-reftable-basics.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/t/t0610-reftable-basics.sh b/t/t0610-reftable-basics.sh
index 6575528f21..e19e036898 100755
--- a/t/t0610-reftable-basics.sh
+++ b/t/t0610-reftable-basics.sh
@@ -207,7 +207,7 @@ test_expect_success 'ref transaction: corrupted tables cause failure' '
 		test_commit file1 &&
 		for f in .git/reftable/*.ref
 		do
-			: >"$f" || return 1
+			test-tool truncate "$f" 0 || return 1
 		done &&
 		test_must_fail git update-ref refs/heads/main HEAD
 	)
In any case, if it seems to reliably fix the issue I'd say we just merge
it. It's unfortunate that we haven't been able to figure out the root
cause, but so be it.
Agreed!

Thanks!

ATB,
Ramsay Jones


Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help