Re: [PATCH 11/11] test-lib: clear watchman watches at test completion

From: Derrick Stolee <hidden>
Date: 2019-12-10 01:43:42

On 12/9/2019 6:40 PM, SZEDER Gábor wrote:

On Mon, Dec 09, 2019 at 09:12:37AM -0500, Derrick Stolee wrote:

quoted

+		watchman watch-list |

Then with the above fixed, trying to run 'watchman' triggers another
error if it's not installed:

  $ GIT_TEST_FSMONITOR="$PWD"/t7519/fsmonitor-none ./t5570-git-daemon.sh 
  [...]
  ok 21 - hostname interpolation works after LF-stripping
  ./t5570-git-daemon.sh: 1484: ./t5570-git-daemon.sh: watchman: not found
  # failed 1 among 21 test(s)

I think we need an additional condition to run this only if
't7519/fsmonitor-watchman' is used in the tests.

The intention is to enable a test-suite-wide run using GIT_TEST_FSMONITOR,
and that can only use watchman (currently).

I've just run 'GIT_TEST_FSMONITOR=$(pwd)/t7519/fsmonitor-all make',
and it only failed one test in 't0090-cache-tree.sh', but the fix is
already in 'pu' in 61eea521fe (fsmonitor: do not compare bitmap size
with size of split index, 2019-11-13).

quoted

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 30b07e310f..067a432ea5 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh

@@ -1072,6 +1072,8 @@ test_atexit_handler () {
 	# sure that the registered cleanup commands are run only once.
 	test : != "$test_atexit_cleanup" || return 0
 
+	test_clear_watchman

I'm not sure where to put this call, but this is definitely not the
right place for it.  See that 'return 0' above in the context?  That's
where the test_atexit_handler function returns early when no atexit
handler commands are set, i.e. in all test scripts that don't involve
some kind of daemons, thus this call is not invoked in the majority of
test scripts.

Ah, I misunderstood the point of test_atexit_handler.

quoted

Simply moving this call before that early return is not good, because
then it would be invoked twice.

An option would be to register this call as an atexit command
somewhere late in 'test-lib.sh' (around where GIT_TEST_GETTEXT_POISON
is restored, perhaps).  That way it would be invoked most of the time,
and it would be invoked only once, but I'm not sure how it would work
out with test scripts that unset GIT_TEST_FSMONITOR somewhere in the
middle for the remainder of the test script.  However, register the
atexit command only if GIT_TEST_FSMONITOR is set (to something
watchman-specific), so it won't be invoked at all if
GIT_TEST_FSMONITOR is not set, and thus it won't generate additional
test output and trace.

I don't have a better idea.

Shouldn't it be sufficient to add it into test_done? If the test fails,
then we could leave watches open, but that's no worse than we had without
this test_clear_watchman method.

I don't know enough about watchman to have an informed opinion.

I think the answer mainly depends on what we want to achive and what
happens when a test script run with GIT_TEST_FSMONITOR exits without
invoking 'test_done' is re-executed (e.g. after a test case fails with
'--immediate' or when the user hits ctrl-c or closes the terminal
window mid-test).

As far as I understand the commit message of v2 of this patch [1], we
mainly want two things:

  - Avoid overloading watchman's watch queue.  For this it might
    indeed be sufficient to clear watches in 'test_done', because most
    test scripts tend to succeed most of the time.

  - Make GIT_TEST_FSMONITOR work reliably on Windows.  For this, I'm
    afraid it's not enough in general, because a failure with
    '--immediate' or after a ctrl-c we won't run 'test_done', so we
    won't clear the watches, and watchman will keep the fd to the
    trash dir open, and, consequently, will interfere with subsequent
    executions of the same test script as it can't delete the still
    existing trash dir left over from the previous run.

You are right. Running an individual test and ending it early would
lead to these leaked handles. This assumes someone is aware of the
GIT_TEST_FSMONITOR environment variable, so they are at least
interacting with the feature directly to some extent.

    It could still be sufficient for fsmonitor-enabled CI builds,
    though, because there we don't re-run tests, don't hit ctrl-c, and
    (at least on Azure Pipelines) don't use '--immediate', and the
    whole VM/container/whatever is thrown away at end anyway.

This is the hope. It would be nice to get to that point.

    On Linux/Unix-y systems it probably doesn't matter much, because
    they can delete open directories, but I wonder what happens with a
    watch when the directory it is supposed observe gets deleted.  If
    the watch is removed in this case, great; if it isn't, then...
    well, then what happens with it?  Will it be overwritten with the
    next test run, or will there be duplicate watches for the same
    dir?

When a directory is deleted from under Watchman on Linux, the watch
is removed...eventually. I'm not sure at exactly what point that happens.
At the very least, Watchman will receive and process the signals for all
of the paths being removed inside the directory. Running 'watch-del'
removes that overhead.

Thanks,
-Stolee

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help