Thread (34 messages) 34 messages, 4 authors, 2019-09-19

Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2019-09-17 03:40:23

On Tue, Sep 17, 2019 at 1:16 PM Sam Bobroff [off-list ref] wrote:
On Tue, Sep 03, 2019 at 08:16:03PM +1000, Oliver O'Halloran wrote:
quoted
Detecting an frozen EEH PE usually occurs when an MMIO load returns a 0xFFs
response. When performing EEH testing using the EEH error injection feature
available on some platforms there is no simple way to kick-off the kernel's
recovery process since any accesses from userspace (usually /dev/mem) will
bypass the MMIO helpers in the kernel which check if a 0xFF response is due
to an EEH freeze or not.

If a device contains a 0xFF byte in it's config space it's possible to
trigger the recovery process via config space read from userspace, but this
is not a reliable method. If a driver is bound to the device an in use it
will frequently trigger the MMIO check, but this is also inconsistent.

To solve these problems this patch adds a debugfs file called
"eeh_dev_check" which accepts a <domain>:<bus>:<dev>.<fn> string and runs
eeh_dev_check_failure() on it. This is the same check that's done when the
kernel gets a 0xFF result from an config or MMIO read with the added
benifit that it can be reliably triggered from userspace.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Looks good, and I tested it with the next patch and it seems to work.

But I think you should make it clear that this does not work with
the hardware "EEH error injection" facility accessible via debugfs in
err_injct (that doesn't seem clear to me from the commit message).
It's not intended to be a separate mechanisms in the long term. I'm
planning on converting this interface to make use the platform defined
error injection mechanism once I can find how to use the PAPR ones
reliably. The idea is to use this as a generic "cause an EEH to happen
on this device" interface for userspace which we can use in test
scripts and the like.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help