Thread (21 messages) 21 messages, 5 authors, 2019-06-29

Re: [PATCH 05/13] powerpc/mce: Allow notifier callback to handle MCE

From: Mahesh Jagannath Salgaonkar <hidden>
Date: 2019-06-21 07:08:23

On 6/21/19 6:27 AM, Santosh Sivaraj wrote:
quoted hunk ↗ jump to hunk
From: Reza Arbab <redacted>

If a notifier returns NOTIFY_STOP, consider the MCE handled, just as we
do when machine_check_early() returns 1.

Signed-off-by: Reza Arbab <redacted>
---
 arch/powerpc/include/asm/asm-prototypes.h |  2 +-
 arch/powerpc/kernel/exceptions-64s.S      |  3 +++
 arch/powerpc/kernel/mce.c                 | 28 ++++++++++++++++-------
 3 files changed, 24 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index f66f26ef3ce0..49ee8f08de2a 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -72,7 +72,7 @@ void machine_check_exception(struct pt_regs *regs);
 void emulation_assist_interrupt(struct pt_regs *regs);
 long do_slb_fault(struct pt_regs *regs, unsigned long ea);
 void do_bad_slb_fault(struct pt_regs *regs, unsigned long ea, long err);
-void machine_check_notify(struct pt_regs *regs);
+long machine_check_notify(struct pt_regs *regs);
 
 /* signals, syscalls and interrupts */
 long sys_swapcontext(struct ucontext __user *old_ctx,
diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 2e56014fca21..c83e38a403fd 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -460,6 +460,9 @@ EXC_COMMON_BEGIN(machine_check_handle_early)
 
 	addi	r3,r1,STACK_FRAME_OVERHEAD
 	bl	machine_check_notify
+	ld	r11,RESULT(r1)
+	or	r3,r3,r11
+	std	r3,RESULT(r1)
 
 	ld	r12,_MSR(r1)
 BEGIN_FTR_SECTION
diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 0ab171b41ede..912efe58e0b1 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -647,16 +647,28 @@ long hmi_exception_realmode(struct pt_regs *regs)
 	return 1;
 }
 
-void machine_check_notify(struct pt_regs *regs)
+long machine_check_notify(struct pt_regs *regs)
 {
-	struct machine_check_event evt;
+	int index = __this_cpu_read(mce_nest_count) - 1;
+	struct machine_check_event *evt;
+	int rc;
 
-	if (!get_mce_event(&evt, MCE_EVENT_DONTRELEASE))
-		return;
+	if (index < 0 || index >= MAX_MC_EVT)
+		return 0;
+
+	evt = this_cpu_ptr(&mce_event[index]);
 
-	blocking_notifier_call_chain(&mce_notifier_list, 0, &evt);
+	rc = blocking_notifier_call_chain(&mce_notifier_list, 0, evt);
+	if (rc & NOTIFY_STOP_MASK) {
+		evt->disposition = MCE_DISPOSITION_RECOVERED;
+		regs->msr |= MSR_RI;
What is the reason for setting MSR_RI ? I don't think this is a good
idea. MSR_RI = 0 means system got MCE interrupt when SRR0 and SRR1
contents were live and was overwritten by MCE interrupt. Hence this
interrupt is unrecoverable irrespective of whether machine check handler
recovers from it or not.

Thanks,
-Mahesh.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help