Thread (45 messages) 45 messages, 5 authors, 2022-01-28

Re: [dpdk-dev] [PATCH v2 1/6] eal: introduce oops handling API

From: Stephen Hemminger <stephen@networkplumber.org>
Date: 2021-08-17 15:09:32

On Tue, 17 Aug 2021 13:08:46 +0530
Jerin Jacob [off-list ref] wrote:
On Tue, Aug 17, 2021 at 9:23 AM Stephen Hemminger
[off-list ref] wrote:
quoted
On Tue, 17 Aug 2021 08:57:18 +0530
[off-list ref] wrote:
 
quoted
From: Jerin Jacob <redacted>

Introducing oops handling API with following specification
and enable stub implementation for Linux and FreeBSD.

On rte_eal_init() invocation, the EAL library installs the
oops handler for the essential signals.
The rte_oops_signals_enabled() API provides the list
of signals the library installed by the EAL.  
This is a big change, and many applications already handle these
signals themselves. Therefore adding this needs to be opt-in
and not enabled by default.  
In order to avoid every application explicitly register this
sighandler and to cater to the
co-existing application-specific signal-hander usage.
The following design has been chosen. (It is mentioned in the commit log,
I will describe here for more clarity)

Case 1:
a) The application installs the signal handler prior to rte_eal_init().
b) Implementation stores the application-specific signal and replace a
signal handler as oops eal handler
c) when application/DPDK get the segfault, the default EAL oops
handler gets invoked
d) Then it dumps the EAL specific message, it calls the
application-specific signal handler
installed in step 1 by application. This avoids breaking any contract
with the application.
i.e Behavior is the same current EAL now.
That is the reason for not using SA_RESETHAND(which call SIG_DFL after
eal oops handler instead
application-specific handler)

Case 2:
a) The application install the signal handler after rte_eal_init(),
b) EAL hander get replaced with application handle then the application can call
rte_oops_decode() to decode.

In order to cater the above use case, rte_oops_signals_enabled() and
rte_oops_decode()
provided.

Here we are not breaking any contract with the application.
Do you have concerns about this design?
In our application as a service it is important not to do any backtrace
in production. We rely on other infrastructure to process coredumps.

This should be controlled enabled by a command line argument.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help