Thread (22 messages) 22 messages, 8 authors, 2018-07-03
STALE2918d

[PATCH] arm64/acpi: Add fixup for HPE m400 quirks

From: geoff@infradead.org (Geoff Levand)
Date: 2018-06-15 17:17:18
Also in: linux-acpi

Hi James,

Just for background, this is a well known bug in the m400's AEPI/HEST
firmware.  There are a number of fixes out there the different distros
have.  I just put together this patch to unify things and have a
common 'upstream' fix.

On 06/15/2018 04:14 AM, James Morse wrote:
On 13/06/18 19:22, Geoff Levand wrote:
quoted
Adds a new ACPI init routine acpi_fixup_m400_quirks that adds
a work-around for HPE ProLiant m400 APEI firmware problems.

The work-around disables APEI when CONFIG_ACPI_APEI is set and
m400 firmware is detected.  Without this fixup m400 systems
experience errors like these on startup:

  [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
  [Hardware Error]: event severity: fatal
  [Hardware Error]:  Error 0, type: fatal
  [Hardware Error]:   section_type: memory error
  [Hardware Error]:   error_status: 0x0000000000001300
"Access to a memory address which is not mapped to any component"

quoted
  [Hardware Error]:   error_type: 10, invalid address
  Kernel panic - not syncing: Fatal hardware error!
Why is this a problem?

Surely this is a valid description of an error.
The firmware bug causes this failure, not bad hardware.
(okay its not particularly useful without the physical address, but the address
is optional in that structure)

When does this happen during boot? This looks like a driver mapping some
non-existent physical address space to see if its device is present...
unsurprisingly this doesn't go well.
(might also be a typo in the DSDT)

Can't we pin down the driver that does this and fix it. Its either wrong for
everyone, or still broken after you disable APEI.

quoted
It seems unlikely there will be any m400 firmware updates to fix
this problem.
What is the problem? This patch looks like it shoots the messenger for bringing
bad news.
 
The news is incorrect, so this patch disables the source (APEI code).
quoted
diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index 7b09487ff8fb..3c315c2c7476 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -31,6 +31,8 @@
 #include <asm/cpu_ops.h>
 #include <asm/smp_plat.h>
 
+#include <acpi/apei.h>
+
 #ifdef CONFIG_ACPI_APEI
 # include <linux/efi.h>
 # include <asm/pgtable.h>
@@ -177,6 +179,33 @@ static int __init acpi_fadt_sanity_check(void)
 	return ret;
 }
 
+/*
+ * acpi_fixup_m400_quirks - Work-around for HPE ProLiant m400 APEI firmware
+ * problems.
+ */
+static void __init acpi_fixup_m400_quirks(void)
+{
+	acpi_status status;
+	struct acpi_table_header *header;
+#if !defined(CONFIG_ACPI_APEI)
+	int hest_disable = HEST_DISABLED;
+#endif
Yuck.
Yes, unfortunately, the hest code conditionally defines hest_disable.
quoted
+
+	if (!IS_ENABLED(CONFIG_ACPI_APEI) || hest_disable != HEST_ENABLED)
+		return;
+
+	status = acpi_get_table(ACPI_SIG_HEST, 0, &header);
+
+	if (ACPI_SUCCESS(status) && !strncmp(header->oem_id, "HPE   ", 6) &&
+		!strncmp(header->oem_table_id, "ProLiant", 8) &&
You should match the affected range of OEM table revisions too, that way a
firmware upgrade should start working, instead of being permanently disabled
because we think its unlikely.
The m400 has reached end of life. No one really expects to see any firmware
update.  I don't know what the effected OEM table revisions are, and I don't
think there is an active platform maintainer who could give that info either.

If someone can provide the info. I'll update the fix.
quoted
+		MIDR_IMPLEMENTOR(read_cpuid_id()) == ARM_CPU_IMP_APM) {
How is the CPU implementer relevant?
That was just a copy of what other fixes had.  Should I remove it?
You suggest a firmware-update would make this issue go away...

quoted
+		hest_disable = HEST_DISABLED;
+		pr_info("Disabled APEI for m400.\n");
+	}
+
+	acpi_put_table(header);
+}
+
 /*
  * acpi_boot_table_init() called from setup_arch(), always.
  *	1. find RSDP and get its address, and then find XSDT
Nothing arch-specific here. You're adding this to arch/arm64 because
drivers/acpi/apei doesn't have an existing quirks table?
There was a fix submitted that had it in drivers/acpi/scan.c, but the
ACPI maintainer said he didn't want the fix in the main ACPI code.
See:

  https://lkml.org/lkml/2018/4/19/1020 (ACPI / scan: Fix regression related to X-Gene UARTs)

The m400 is an arm64 platform, so it seems most appropriate to
have it in arch/arm64/kernel/acpi.c.  I followed what was done
for x86 quirks (into arch/x86/kernel/acpi/boot.c), and what was
suggested here: 

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581 (linux: Enable Buster kernel features for newer ARM64 servers)

Thanks for the review.

-Geoff
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help