Thread (21 messages) 21 messages, 4 authors, 2018-10-13

Re: [PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

From: Christophe LEROY <hidden>
Date: 2018-10-11 14:25:27


Le 09/10/2018 à 14:14, Nicholas Piggin a écrit :
On Tue, 9 Oct 2018 14:01:37 +0200
Christophe LEROY [off-list ref] wrote:
quoted
Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
quoted
On Tue, 9 Oct 2018 09:36:18 +0000
Christophe Leroy [off-list ref] wrote:
   
quoted
On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
quoted
On Tue, 9 Oct 2018 06:46:30 +0200
Christophe LEROY [off-list ref] wrote:
      
quoted
Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
quoted
On Mon, 8 Oct 2018 17:39:11 +0200
Christophe LEROY [off-list ref] wrote:
         
quoted
Hi Nick,

Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
quoted
Use nmi_enter similarly to system reset interrupts. This uses NMI
printk NMI buffers and turns off various debugging facilities that
helps avoid tripping on ourselves or other CPUs.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
      arch/powerpc/kernel/traps.c | 9 ++++++---
      1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 2849c4f50324..6d31f9d7c333 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
      
      void machine_check_exception(struct pt_regs *regs)
      {
-	enum ctx_state prev_state = exception_enter();
      	int recover = 0;
+	bool nested = in_nmi();
+	if (!nested)
+		nmi_enter();
This alters preempt_count, then when die() is called
in_interrupt() returns true allthough the trap didn't happen in
interrupt, so oops_end() panics for "fatal exception in interrupt"
instead of gently sending SIGBUS the faulting app.
Thanks for tracking that down.
         
quoted
Any idea on how to fix this ?
I would say we have to deliver the sigbus by hand.

        if ((user_mode(regs)))
            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
        else
            die("Machine check", regs, SIGBUS);
         
And what about all the other things done by 'die()' ?

And what if it is a kernel thread ?

In one of my boards, I have a kernel thread regularly checking the HW,
and if it gets a machine check I expect it to gently stop and the die
notification to be delivered to all registered notifiers.

Until before this patch, it was working well.
I guess the alternative is we could check regs->trap for machine
check in the die test. Complication is having to account for MCE
in an interrupt handler.

          if (in_interrupt()) {
                   if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
                       panic("Fatal exception in interrupt");
          }

Something like that might work for you? We needs a ppc64 macro for the
MCE, and can probably add something like in_nmi_from_interrupt() for
the second part of the test.
Don't know, I'm away from home on business trip so I won't be able to
test anything before next week. However it looks more or less like a
hack, doesn't it ?
I thought it seemed okay (with the right functions added). Actually it
could be a bit nicer to do this, then it works generally :

           if (in_interrupt()) {
                    if (!in_nmi() || in_nmi_from_interrupt())
                        panic("Fatal exception in interrupt");
           }

Yes looks nice, but:
1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() ||
in_softirq()) ?
   return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;

(basically just in_interrupt() with the nmi_enter undone)
quoted
2/ what about in_nmi_from_nmi(), how do we detect that ?
Oh good point, I'm not sure. I guess we could irq_enter() in the
nested case, I think that would make in_nmi_from_interrupt()
return true.
Yes we could, but I find it ugly.

Don't you think it looks less strange to just check in_interrupt() 
before calling nmi_enter()  ?

Christophe
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help