Thread (4 messages) 4 messages, 2 authors, 2024-10-02

Re: fsmonitor deadlock / macOS CI hangs

From: Koji Nakamaru <hidden>
Date: 2024-10-02 01:46:16

On Tue, Oct 1, 2024 at 4:46 AM Jeff King [off-list ref] wrote:
I did some more digging on the hangs we sometimes see when running the
test suite on macOS. I'm cc-ing Patrick as somebody who dug into this
before, and Johannes as the only still-active person mentioned in the
relevant code.

For those just joining, you can reproduce the issue by running t9211
with --stress on macOS. Some earlier notes are here:

  https://lore.kernel.org/git/20240517081132.GA1517321@coredump.intra.peff.net/ (local)

but the gist of it is that we end up with Git processes waiting to read
from fsmonitor, but fsmonitor hanging.
Perhaps I found the cause. fsmonitor_run_daemon_1() starts the fsevent
listener thread before with_lock__wait_for_cookie() is called.

      /*
       * Start the fsmonitor listener thread to collect filesystem
       * events.
       */
      if (pthread_create(&state->listener_thread, NULL,
                         fsm_listen__thread_proc, state)) {
              ipc_server_stop_async(state->ipc_server_data);
              err = error(_("could not start fsmonitor listener thread"));
              goto cleanup;
      }
      listener_started = 1;

fsm_listen__thread_proc() starts the following:

      fsm_listen__loop(state);

which is defined as below for darwin:

  void fsm_listen__loop(struct fsmonitor_daemon_state *state)
  {
          struct fsm_listen_data *data;

          data = state->listen_data;

          pthread_mutex_init(&data->dq_lock, NULL);
          pthread_cond_init(&data->dq_finished, NULL);
          data->dq = dispatch_queue_create("FSMonitor", NULL);

          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;
          }
          data->stream_started = 1;

          ...

Normally FSEventStreamStart() is called before
with_lock__wait_for_cookie() creates a cookie file, but this is not
guaranteed. We can reproduce the issue easily if we modify
fsm_listen__loop() as below:

  --- a/compat/fsmonitor/fsm-listen-darwin.c
  +++ b/compat/fsmonitor/fsm-listen-darwin.c
  @@ -510,6 +510,7 @@ void fsm_listen__loop(struct
fsmonitor_daemon_state *state)
          FSEventStreamSetDispatchQueue(data->stream, data->dq);
          data->stream_scheduled = 1;

  +       sleep(1);
          if (!FSEventStreamStart(data->stream)) {
                  error(_("Failed to start the FSEventStream"));
                  goto force_error_stop_without_loop;


Koji Nakamaru
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help