Re: [PATCH 1/3] usage: refactor die-recursion checks

From: Brandon Casey <hidden>
Date: 2016-06-15 22:56:51

On Mon, Apr 15, 2013 at 5:42 PM, Jeff King [off-list ref] wrote:

On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote:

quoted

+static void check_die_recursion(const char *fmt, va_list ap)
+{
+       static int dying;
+
+       if (!dying++)
+               return;
+
+       vreportf("fatal: ", fmt, ap);

How do you know it's safe to call vreportf() ?

Because I hand-audited it.

:)

But I think the more important question you
are getting at is: how do I know that it will remain safe to call
vreportf?

Right.

quoted

If the bug is in the vreportf code path, we will recurse infinitely
(at least until the stack is used up). An implementation of vsnprintf
exists in compat/snprintf.c for example.

Right. My assumption was that we are primarily interested in protecting
against the die_routine. Compat functions should never be calling die.

I think the rule we have been enforcing is less strict than that.  We
have only said that any compat function in a die handler path should
never call die.  But maybe that's what you meant.

Of course that is assuming nobody violates that rule, which is part of
the point of the check.

quoted

It's nice to print out the error message here, but I think doing so
defeats the purpose of this "dying" check.  Better to get the stack
trace from a core dump.

Easier said than done, sometimes, if you are debugging somebody else's
problem from a dump of a terminal session. :)

But I can live with dropping this patch; my primary goal is the bug-fix
in patch three.

I was also tempted to suggest just dropping the recursion check
altogether. While it is neat to detect such things, it's a "should never
happen" bug situation, and an infinite loop of printing out the same
message is pretty easy to notice. Do you remember what spurred the
original check? The message in cd163d4 doesn't say.

That's a valid option.

The primary motivation was that Hannes Sixt had to step in and point
out yet again that the high-level memory allocators should not be
called in anything that could be in a die handler code path.  I was on
the thread, but I don't remember the topic (ah, Jonathan has stepped
in with the answer).  I do remember that I was not the only one who
had forgotten about that rule though.

We didn't actually have someone report that they encountered infinite
recursion, but it seemed easy enough to add a check for it, so...

Unfortunately, I didn't realize that the async functions installed
their own die handler which may not exit the process, allowing die to
be called multiple times.

To implement this check correctly/completely (i.e. detect recursion in
the main thread as well as in any child threads), I think you really
do need to use thread-local storage as you mentioned in 3/3 which
could look something like:

   static pthread_key_t dying;
   static pthread_once_t dying_once = PTHREAD_ONCE_INIT;

   void setup_die_counter(void)
   {
           pthread_key_create(&dying, NULL);
   }

   check_die_recursion(void)
   {
           pthread_once(&dying_once, setup_die_counter);
           if (pthread(getspecific(dying)) {
                   puts("BUG: recursion...");
                   exit(128);
           }

           pthread_setspecific(dying, &dying);
   }

or maybe the setup could be performed in set_die_routine(), but it
does kinda seem like overkill for a "nicety" like this.  So maybe
checking for recursion in just the main thread as this series does is
better than nothing.

-Brandon

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help