Thread (16 messages) 16 messages, 7 authors, 2022-10-20

Re: [PATCH] tmp-objdir: do not opendir() when handling a signal

From: John Cai <hidden>
Date: 2022-09-27 13:50:49

Hey Peff,

On 27 Sep 2022, at 7:44, Jeff King wrote:
On Mon, Sep 26, 2022 at 11:53:03PM +0000, John Cai via GitGitGadget wrote:
quoted
One place we call tmp_objdir_create() is in git-receive-pack, where
we create a temporary quarantine directory "incoming". Incoming
objects will be written to this directory before they get moved to
the object directory.

We have observed this code leading to a deadlock:

	Thread 1 (Thread 0x7f621ba0b200 (LWP 326305)):
	#0  __lll_lock_wait_private (futex=futex@entry=0x7f621bbf8b80
		<main_arena>) at ./lowlevellock.c:35
	#1  0x00007f621baa635b in __GI___libc_malloc
		(bytes=bytes@entry=32816) at malloc.c:3064
	#2  0x00007f621bae9f49 in __alloc_dir (statp=0x7fff2ea7ed60,
		flags=0, close_fd=true, fd=5)
		at ../sysdeps/posix/opendir.c:118
	#3  opendir_tail (fd=5) at ../sysdeps/posix/opendir.c:69
	#4  __opendir (name=<optimized out>)
		at ../sysdeps/posix/opendir.c:92
	#5  0x0000557c19c77de1 in remove_dir_recurse ()
	#6  0x0000557c19d81a4f in remove_tmp_objdir_on_signal ()
	#7  <signal handler called>
Yuck. Your analysis looks right, and certainly opendir() can't really
work without allocating memory for the pointer-to-DIR.
quoted
To prevent this, add a flag REMOVE_DIR_SIGNAL that allows
remove_dir_recurse() to know that a signal is being handled and avoid
calling opendir(3). One implication of this change is that when
signal handling, the temporary directory may not get cleaned up
properly.
It's not really "may not" here, is it? It will never get cleaned up on a
signal now. I don't think remove_dir_recursively() will try to rmdir()
in this case. But even if it did, we'll always have a "pack"
subdirectory (minus the small race before we've created it).

That's unfortunate, but I don't think we really have a portable
alternative. We can't keep an exact list of files to be deleted, because
some of them will be created by sub-processes. We could perhaps exec a
helper that does the deletion, but that seems like a race and
portability nightmare. On Linux, we could probably use open() and
getdents64() to traverse, but obviously that won't work everywhere. It
_might_ be worth having some kind of compat/ wrapper here, to let
supported systems do as good a job as they can. But it's not like there
aren't already cases where we might leave the tmp-objdir directory
around (say, SIGKILL), so this is really just extending the existing
problem to more signals.

I was going to suggest we should do a better job of cleaning up these
directories via git-gc. But it looks like b3cecf49ea (tmp-objdir: new
API for creating temporary writable databases, 2021-12-06) changed the
default name such that a regular git-gc should do so. So I think we're
covered there.
I was wondering about this as well. Thanks for checking on this--that's
reassuring.
The main case we care about is normal exit when index-pack or a hook
sees an error, in which case we should still be cleaning up via the
atexit() handler.

So I think your patch is going in the right direction, but...
quoted
 static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
 {
-	DIR *dir;
+	DIR *dir = NULL;
 	struct dirent *e;
 	int ret = 0, original_len = path->len, len, kept_down = 0;
 	int only_empty = (flag & REMOVE_DIR_EMPTY_ONLY);
@@ -3261,7 +3261,10 @@ static int remove_dir_recurse(struct strbuf *path, int flag, int *kept_up)
 	}

 	flag &= ~REMOVE_DIR_KEEP_TOPLEVEL;
-	dir = opendir(path->buf);
+
+	if ((flag & REMOVE_DIR_SIGNAL) == 0)
+		dir = opendir(path->buf);
+
 	if (!dir) {
 		if (errno == ENOENT)
 			return keep_toplevel ? -1 : 0;
We skip calling opendir() entirely, so "dir" will still be NULL. But we
immediately start looking at errno, which will have some undefined value
(based on some previous syscall).

If we set "errno" to "EACCES" in this case, then we'd at least hit the
rmdir() below:

         if (!dir) {
                  if (errno == ENOENT)
                          return keep_toplevel ? -1 : 0;
                  else if (errno == EACCES && !keep_toplevel)
                          /*
                           * An empty dir could be removable even if it
                           * is unreadable:
                           */
                          return rmdir(path->buf);
                  else
                          return -1;
          }

but we know it won't really do anything for our proposed caller, since
it will have files inside the directory that need to be removed before
rmdir() can work.
yeah, I suppose the only case it would help is if the directory is already
empty.
quoted hunk ↗ jump to hunk
Moreover, if you were to combine REMOVE_DIR_SIGNAL with
REMOVE_DIR_KEEP_NESTED_GIT, I suspect that the call to
resolve_gitlink_ref() would end up with similar deadlocks. Obviously
nobody is proposing to do that, but it is a pitfall in the API.

So all of that makes me think we should not add a new flag here, but
instead just avoid calling the function entirely from
tmp_objdir_destroy_1().

But then we can observe that tmp_objdir_destroy_1() is basically doing
nothing if on_signal is set. So there is really no point in setting up
the signal handler at all. We should just set up the atexit() handler.
I.e., something like:
diff --git a/tmp-objdir.c b/tmp-objdir.c
index a8be92bca1..10549e95db 100644
--- a/tmp-objdir.c
+++ b/tmp-objdir.c
@@ -169,7 +169,6 @@ struct tmp_objdir *tmp_objdir_create(const char *prefix)
 	the_tmp_objdir = t;
 	if (!installed_handlers) {
 		atexit(remove_tmp_objdir);
-		sigchain_push_common(remove_tmp_objdir_on_signal);
 		installed_handlers++;
 	}
This makes sense and is a clean solution. I guess the only case where we would
benefit in calling into remove_tmp_objdir_on_signal() is if the temp dir exists
but is empty. I'm not sure how often this would happen to make it worth it?
Probably not...

with the commit message explaining that we can't do the cleanup in a
portable and signal-safe way, so we just punt on the whole concept.

There's also some minor cleanup we could do elsewhere to drop the
"on_signal" argument (which can come as part of the same patch, or on
top).

-Peff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help