[PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority

[PATCH v6 00/18] APEI in_nmi() rework · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 01/18] ACPI / APEI: Move the estatus queue code up, and under its own ifdef · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 02/18] ACPI / APEI: Generalise the estatus queue's add/remove and notify code · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 03/18] ACPI / APEI: don't wait to serialise with oops messages when panic()ing · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue · Borislav Petkov <bp@alien8.de> · 2018-09-28
[PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-01
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-10-03
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-04
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-10-12
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · Borislav Petkov <bp@alien8.de> · 2018-10-12
Re: [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · James Morse <james.morse@arm.com> · 2018-10-12
[PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · Borislav Petkov <bp@alien8.de> · 2018-10-12
Re: [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · James Morse <james.morse@arm.com> · 2018-10-12
[PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 09/18] ACPI / APEI: Let the notification helper specify the fixmap slot · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 09/18] ACPI / APEI: Let the notification helper specify the fixmap slot · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus() · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus() · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 12/18] ACPI / APEI: Don't store CPER records physical address in struct ghes · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 16/18] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · Borislav Petkov <bp@alien8.de> · 2018-10-15
Re: [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · Peter Zijlstra <peterz@infradead.org> · 2018-10-16
[PATCH v6 18/18] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 00/18] APEI in_nmi() rework · Borislav Petkov <bp@alien8.de> · 2018-09-25
Re: [PATCH v6 00/18] APEI in_nmi() rework · James Morse <james.morse@arm.com> · 2018-10-03
Re: [PATCH v6 00/18] APEI in_nmi() rework · Borislav Petkov <bp@alien8.de> · 2018-10-04

From: bp@alien8.de (Borislav Petkov)
Date: 2018-10-15 16:49:30
Also in: kvmarm, linux-acpi, linux-mm

+ Peter.

On Fri, Sep 21, 2018 at 11:17:04PM +0100, James Morse wrote:

quoted hunk ↗ jump to hunk

arm64 can take an NMI-like error notification when user-space steps in
some corrupt memory. APEI's GHES code will call memory_failure_queue()
to schedule the recovery work. We then return to user-space, possibly
taking the fault again.

Currently the arch code unconditionally signals user-space from this
path, so we don't get stuck in this loop, but the affected process
never benefits from memory_failure()s recovery work. To fix this we
need to know the recovery work will run before we get back to user-space.

Increase the priority of the recovery work by scheduling it on the
system_highpri_wq, then try to bump the current task off this CPU
so that the recovery work starts immediately.

Reported-by: Xie XiuQi <redacted>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Punit Agrawal <redacted>
Tested-by: Tyler Baicar <redacted>
Tested-by: gengdongjiu <redacted>
CC: Xie XiuQi <redacted>
CC: gengdongjiu <redacted>
---
 mm/memory-failure.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0cd3de3550f0..4e7b115cea5a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c

@@ -56,6 +56,7 @@
 #include <linux/memory_hotplug.h>
 #include <linux/mm_inline.h>
 #include <linux/memremap.h>
+#include <linux/preempt.h>
 #include <linux/kfifo.h>
 #include <linux/ratelimit.h>
 #include <linux/page-isolation.h>

@@ -1454,6 +1455,7 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
  */
 void memory_failure_queue(unsigned long pfn, int flags)
 {
+	int cpu = smp_processor_id();
 	struct memory_failure_cpu *mf_cpu;
 	unsigned long proc_flags;
 	struct memory_failure_entry entry = {

@@ -1463,11 +1465,14 @@ void memory_failure_queue(unsigned long pfn, int flags)
 
 	mf_cpu = &get_cpu_var(memory_failure_cpu);
 	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
-	if (kfifo_put(&mf_cpu->fifo, entry))
-		schedule_work_on(smp_processor_id(), &mf_cpu->work);
-	else
+	if (kfifo_put(&mf_cpu->fifo, entry)) {
+		queue_work_on(cpu, system_highpri_wq, &mf_cpu->work);
+		set_tsk_need_resched(current);
+		preempt_set_need_resched();

What guarantees the workqueue would run before the process? I see this:

``WQ_HIGHPRI``
  Work items of a highpri wq are queued to the highpri
  worker-pool of the target cpu.  Highpri worker-pools are
  served by worker threads with elevated nice level.

but is that enough?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help