[PATCH v5 1/3] arm64/ras: support sea error recovery
From: james.morse@arm.com (James Morse)
Date: 2018-01-30 19:21:47
Also in:
linux-acpi, lkml
Hi Xie XiuQi, On 26/01/18 12:31, Xie XiuQi wrote:
With ARM v8.2 RAS Extension, SEA are usually triggered when memory errors are consumed. According to the existing process, errors occurred in the kernel, leading to direct panic, if it occurred the user-space, we should just kill process. But there is a class of error, in fact, is not necessary to kill process, you can recover and continue to run the process. Such as the instruction data corrupted, where the memory page might be read-only, which is has not been modified, the disk might have the correct data, so you can directly drop the page, ant reload it when necessary.
With firmware-first support, we do all this...
So this patchset is just try to solve such problem: if the error is consumed in user-space and the error occurs on a clean page, you can directly drop the memory page without killing process. If the corrupted page is clean, just dropped it and return to user-space without side effects. And if corrupted page is dirty, memory_failure() will send SIGBUS with code=BUS_MCEERR_AR. While without this patchset, do_sea() will just send SIGBUS, so the process was killed in the same place.
... but this happens too. I agree its something we should fix, but I don't think this is the best way to do it. This series is pulling the memory-failure-queue details back into the arch-code to build a second list, that gets processed as extra work when we return to user-space. The root of the issue is ghes_notify_sea() claims the notification as something APEI has dealt with, ... but it hasn't done it yet. The signals will be generated by something currently stuck in a queue. (Evidently x86 doesn't handle synchronous errors like this using firmware-first). I think a smaller fix is to give the queues that may be holding the memory_failure() work a kick as part of the code that calls ghes_notify_sea(). This means that by the time we return to do_sea() ghes_notify_sea()'s claim that APEI has dealt with it is true as any generated signals are pending. We can then skip the existing SIGBUS generation code.
Because memory_failure() may sleep, we can not call it directly in SEA
(this one is more serious, I've attempted to fix it by moving all NMI-like GHES-notifications to use the estatus queue).
exception context. So we saved faulting physical address associated with a process in the ghes handler and set __TIF_SEA_NOTIFY. When we return from SEA exception context and get into do_notify_resume() before the process running, we could check it and call memory_failure() to do recovery.
It's safe, because we are in process context.
I think this is the trick. When we take a Synchronous-external-abort out of userspace, we're in process context too. We can add helpers to drain the memory_failure_queue which can be called when do_sea() when we know we're preemptible and interrupts-et-al are unmasked. Thanks, James [0] https://www.spinics.net/lists/linux-acpi/msg80149.html
--- arch/arm64/Kconfig | 11 +++ arch/arm64/include/asm/ras.h | 23 ++++++ arch/arm64/include/asm/thread_info.h | 4 +- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/ras.c | 142 +++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 7 ++ arch/arm64/mm/fault.c | 27 +++++-- drivers/acpi/apei/ghes.c | 8 +- include/acpi/ghes.h | 3 + 9 files changed, 216 insertions(+), 10 deletions(-) create mode 100644 arch/arm64/include/asm/ras.h create mode 100644 arch/arm64/kernel/ras.c