Re: hardcoded SIGSEGV in __die() ?
From: Joakim Tjernlund <hidden>
Date: 2020-03-27 10:12:44
On Thu, 2020-03-26 at 11:28 +1100, Michael Ellerman wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Joakim Tjernlund [off-list ref] writes:quoted
On Mon, 2020-03-23 at 15:45 +0100, Christophe Leroy wrote:quoted
Le 23/03/2020 à 15:43, Christophe Leroy a écrit :quoted
Le 23/03/2020 à 15:17, Joakim Tjernlund a écrit :quoted
In __die(), see below, there is this call to notify_send() with SIGSEGV hardcoded, this seems odd to me as the variable "err" holds the true signal(in my case SIGBUS) Should not SIGSEGV be replaced with the true signal no.?As far as I can see, comes from https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3D66fcb1059&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&sdata=LBzRMxHWJzNEztnnG0UzJb7PHvaDGVswQD%2B8WpY9YX8%3D&reserved=0And https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftorvalds%2Flinux.git%2Fcommit%2F%3Fid%3Dae87221d3ce49d9de1e43756da834fd0bf05a2ad&data=02%7C01%7CJoakim.Tjernlund%40infinera.com%7Caa316058f9e34dd758c808d7d11ca391%7C285643de5f5b4b03a1530ae2dc8aaf77%7C1%7C0%7C637207793252449714&sdata=Dh%2BUTRgG85oVSgC3SCR1B7izQH4HofT4ppOMiy9xvDA%3D&reserved=0 shows it is (was?) similar on x86.I tried to follow that chain thinking it would end up sending a signal to user space but I cannot see that happens. Seems to be related to debugging. In short, I cannot see any signal being delivered to user space. If so that would explain why our user space process never dies. Is there a signal hidden in machine_check handler for SIGBUS I cannot see?It's platform specific. What platform are you on?
I am on e500, e5500(e500mc) and 83xx :)
See the ppc_md & cur_cpu_spec calls here:
void machine_check_exception(struct pt_regs *regs)
{
int recover = 0;
bool nested = in_nmi();
if (!nested)
nmi_enter();
__this_cpu_inc(irq_stat.mce_exceptions);
add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
/* See if any machine dependent calls. In theory, we would want
* to call the CPU first, and call the ppc_md. one if the CPU
* one returns a positive number. However there is existing code
* that assumes the board gets a first chance, so let's keep it
* that way for now and fix things later. --BenH.
*/
if (ppc_md.machine_check_exception)
recover = ppc_md.machine_check_exception(regs);
else if (cur_cpu_spec->machine_check)
recover = cur_cpu_spec->machine_check(regs);
if (recover > 0)
goto bail;
Either the ppc_md or cpu_spec handlers can send a signal, but after a
bit of grepping I think only the pseries and powernv ones do.Seems so
If you get into die() then it's an oops, which is not the same as a normal signal.
Exactly, and the die/OOPS does not seem work as intended either. The system tries to limp along and generates more similar OOPses and may even hang.
cheers