Thread (22 messages) 22 messages, 8 authors, 2018-07-03
STALE2918d

[PATCH] arm64/acpi: Add fixup for HPE m400 quirks

From: Mark Salter <hidden>
Date: 2018-06-22 15:19:24
Also in: linux-acpi

On Tue, 2018-06-19 at 11:21 +0100, James Morse wrote:
Hi Mark,

On 18/06/18 23:18, Mark Salter wrote:
quoted
On Mon, 2018-06-18 at 11:04 -0700, Geoff Levand wrote:
quoted
Thanks for all the comments, but my lack of access to an m400 platform, and
my lack of knowledge about the m400 limits what I can comment on and what I
can do.  
I can take another look at this on an m400 here.
Thanks!

quoted
I don't believe it is a
memory access to physical space with nothing attached to it.
That is what the CPER records are describing though.
Yes.
quoted
I seem to recall
an errata with xgene-1 where such accesses cause the cpu to halt. But I could
be misremembering that. I have no trouble believing the firmware ras code was
untested. It is probably some boilerplate code built in before ras was supported
in kernel.
It would be interesting to know which GHES this error is being found in, and
whether the Error Status Block points anywhere (or at an empty block) when Linux
is started from UEFI.

If there is something in the Error Status Block out of UEFI, then this must be
something triggered by UEFI, or a bug that can be fixed by UEFI clearing out the
CPER records.

https://bugzilla.redhat.com/show_bug.cgi?id=1285107
suggests redhat can rebuild the UEFI firmware for this box.


If there is nothing in the Error Status Block when Linux is started, surely
Linux is doing something to cause this to happen. I'd like to find out what, as
its probably a software bug.


(The case where disabling HEST would be the right thing to do is if there is a
bogus GHES->GAS entry in GHES.0, the access to which causes GHES.1 to be
populated with 'Access to an address not mapped to any component', which we find
next. If this is the case it would be better to check GHES entries against the
UEFI memory map to check this is memory, and it was reserved.)

quoted
But the problem occurs early enough in boot where there can't be
that many things that would cause a problem on m400 and not mustang so I'll
look again.
Playing spot the difference in the dmesg, I'd check for smoke coming out of:
quoted
acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
xgene-gpio APMC0D14:00: X-Gene GPIO driver registered.
pcie_pme: probe of 0000:00:00.0:pcie001 failed with error -22
I've eliminated these by building a kernel with minimalized config and hacks (ACPI
requires PCI, so I added code to prevent the root complexe from being probed). I
also eliminated all the xgene-specific devices from the config (network, sata,
etc). Still hit the ghes panic.

I'm going to hack something to get to the ghes info earlier in boot and
check the things you mention above wrt Error Status Block and GHES.0.

If the firmware description of the GIC is wrong in someway, disabling KVM may be
worth testing too.


Thanks,

James
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help