Re: [PATCH v3 3/4] daemon: use sigaction() to install child_handler()
From: Carlo Marcelo Arenas Belón <hidden>
Date: 2025-06-26 16:36:04
On Thu, Jun 26, 2025 at 08:33:29AM -0800, Junio C Hamano wrote:
Phillip Wood [off-list ref] writes:quoted
Only because you have chosen to use SA_RESTART here. I think it would be better to drop that and instead say something like POSIX leaves several aspects of the behavior of signal() up to the implementer. It is unspecified whether long running syscalls are restarted after an interrupt is received. The event loop in "git daemon" assumes that poll() will return with EINTR if a child exits while it is polling but if poll() is restarted after an interrupt is received that does not happen which can lead to a build up of uncollected zombie processes. It is also unspecified whether the handler is reset after an interrupt is received. In order to work correctly on operating systems such as Solaris where the handler for SIGCLD is reset when a signal is received "git daemon" calls signal() from the SIGCLD handler to re-establish a handler for that signal. Unfortunately this causes infinite recursion on other operating systems such as AIX. Both of these problems can be addressed by using sigaction() instead of signal() which has much stricter semantics and so, by setting the appropriate flags, we can rely on poll() being interrupted and the SIGCLD handler not being reset. This change means that all long running system calls could fail with EINTR not just pall() but rest of the event loop code is designed to gracefully handle that. After this change there is still a race where a child that exits after it has been checked in check_dead_children() but before we call poll() will not be collected until a new connection is received or a child exits while we're polling. We could fix this by using the "self-pipe trick" but do not because .... Then we can drop patches 1 and 4.Thanks for a suggestion. I really like the "everybody in these code paths are prepared to receive and handle EINTR just fine, so we do not have to do the SA_RESTART" observation very much.
Except that is unlikely to be true, as the code has changed a lot on those 20 years (including that it now uses run_command) and that we had been effectively using SA_RESTART under the covers for most of that time because signal() behaviour changed. An obvious bug: (ex: 20250626161038.85966-1-carenas@gmail.com) I think using SA_RESTART by default might be safer, specially considering that patch 4 already exist and it is not that complicated now that we have 2. Carlo