[PATCH v6 00/18] APEI in_nmi() rework

[PATCH v6 00/18] APEI in_nmi() rework · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 01/18] ACPI / APEI: Move the estatus queue code up, and under its own ifdef · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 02/18] ACPI / APEI: Generalise the estatus queue's add/remove and notify code · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 03/18] ACPI / APEI: don't wait to serialise with oops messages when panic()ing · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 04/18] ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue · Borislav Petkov <bp@alien8.de> · 2018-09-28
[PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-01
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-10-03
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-04
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · James Morse <james.morse@arm.com> · 2018-10-12
Re: [PATCH v6 05/18] ACPI / APEI: Make estatus queue a Kconfig symbol · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · Borislav Petkov <bp@alien8.de> · 2018-10-12
Re: [PATCH v6 06/18] KVM: arm/arm64: Add kvm_ras.h to collect kvm specific RAS plumbing · James Morse <james.morse@arm.com> · 2018-10-12
[PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · Borislav Petkov <bp@alien8.de> · 2018-10-12
Re: [PATCH v6 07/18] arm64: KVM/mm: Move SEA handling behind a single 'claim' interface · James Morse <james.morse@arm.com> · 2018-10-12
[PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 08/18] ACPI / APEI: Move locking to the notification helper · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 09/18] ACPI / APEI: Let the notification helper specify the fixmap slot · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 09/18] ACPI / APEI: Let the notification helper specify the fixmap slot · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 10/18] ACPI / APEI: preparatory split of ghes->estatus · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus() · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 11/18] ACPI / APEI: Remove silent flag from ghes_read_estatus() · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 12/18] ACPI / APEI: Don't store CPER records physical address in struct ghes · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 13/18] ACPI / APEI: Don't update struct ghes' flags in read/clear estatus · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 14/18] ACPI / APEI: Split ghes_read_estatus() to read CPER length · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 15/18] ACPI / APEI: Only use queued estatus entry during _in_nmi_notify_one() · Borislav Petkov <bp@alien8.de> · 2018-10-12
[PATCH v6 16/18] ACPI / APEI: Split fixmap pages for arm64 NMI-like notifications · James Morse <james.morse@arm.com> · 2018-09-21
[PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · Borislav Petkov <bp@alien8.de> · 2018-10-15
Re: [PATCH v6 17/18] mm/memory-failure: increase queued recovery work's priority · Peter Zijlstra <peterz@infradead.org> · 2018-10-16
[PATCH v6 18/18] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work · James Morse <james.morse@arm.com> · 2018-09-21
Re: [PATCH v6 00/18] APEI in_nmi() rework · Borislav Petkov <bp@alien8.de> · 2018-09-25
Re: [PATCH v6 00/18] APEI in_nmi() rework · James Morse <james.morse@arm.com> · 2018-10-03
Re: [PATCH v6 00/18] APEI in_nmi() rework · Borislav Petkov <bp@alien8.de> · 2018-10-04

From: james.morse@arm.com (James Morse)
Date: 2018-10-03 17:50:48
Also in: kvmarm, linux-acpi, linux-mm

Hi Boris,

On 25/09/18 13:45, Borislav Petkov wrote:

On Fri, Sep 21, 2018 at 11:16:47PM +0100, James Morse wrote:

quoted

Hello,

The GHES driver has collected quite a few bugs:

ghes_proc() at ghes_probe() time can be interrupted by an NMI that
will clobber the ghes->estatus fields, flags, and the buffer_paddr.

ghes_copy_tofrom_phys() uses in_nmi() to decide which path to take. arm64's
SEA taking both paths, depending on what it interrupted.

There is no guarantee that queued memory_failure() errors will be processed
before this CPU returns to user-space.

x86 can't TLBI from interrupt-masked code which this driver does all the
time.


This series aims to fix the first three, with an eye to fixing the
last one with a follow-up series.

Previous postings included the SDEI notification calls, which I haven't
finished re-testing. This series is big enough as it is.

Yeah, and everywhere I look, this thing looks overengineered. Like,
for example, what's the purpose of this ghes_esource_prealloc_size()
computing a size each time the pool changes size?

The size to grow the pool by, because each error-source described by a GHES
entry has its own worst-case size.

Today ghes_nmi_add() does this each time its called. You could have multiple
GHES entries in the HEST that describe NMI as the notification. The worst-case
size for the records is described in the GHES entry, and could be different for
each one. (error_block_length and records_to_preallocate, or table 18-379 of
acpi v6.2)

These different error-sources could be delivered on different CPUs at the same
time, so need their own pre-allocated reserved memory. ghes_notify_nmi()'s
atomic_add_unless() suggests this can happen on x86, but I don't know the
arch-specifics. It definitely can happen on arm64.

AFAICT, this size can be computed exactly *once* at driver init and be
done with it. Right?

We could do two passes of the HEST to pre-compute the total size of this
estatus-queue memory, allocate it, then do the notification registration stuff.
But this doesn't really work with the way this driver acts as platform-driver
for a ghes device...

The non-ghes HEST entries have a "number of records to pre-allocate" too, we
could make this memory pool something hest.c looks after, but I can't see if the
other error sources use those values.

Hmmm,
The size is capped to 64K, we could ignore the firmware description of the
memory requirements, and allocate SZ_64K each time. Doing it per-GHES is still
the only way to avoid allocating nmi-safe memory for irqs.


Thanks,

James

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help