Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check

EEH + hotplug fixes · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
[PATCH 01/14] powerpc/eeh: Clean up EEH PEs after recovery finishes · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 01/14] powerpc/eeh: Clean up EEH PEs after recovery finishes · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 01/14] powerpc/eeh: Clean up EEH PEs after recovery finishes · Michael Ellerman <hidden> · 2019-09-19
[PATCH 02/14] powerpc/eeh: Fix race when freeing PDNs · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 02/14] powerpc/eeh: Fix race when freeing PDNs · Sam Bobroff <hidden> · 2019-09-17
[PATCH 03/14] powerpc/eeh: Make permanently failed devices non-actionable · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 03/14] powerpc/eeh: Make permanently failed devices non-actionable · Sam Bobroff <hidden> · 2019-09-17
[PATCH 04/14] powerpc/eeh: Check slot presence state in eeh_handle_normal_event() · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 04/14] powerpc/eeh: Check slot presence state in eeh_handle_normal_event() · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 04/14] powerpc/eeh: Check slot presence state in eeh_handle_normal_event() · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
[PATCH 05/14] powerpc/eeh: Defer printing stack trace · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 05/14] powerpc/eeh: Defer printing stack trace · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
[PATCH 06/14] powerpc/eeh: Remove stale CAPI comment · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 06/14] powerpc/eeh: Remove stale CAPI comment · Andrew Donnellan <hidden> · 2019-09-03
Re: [PATCH 06/14] powerpc/eeh: Remove stale CAPI comment · Sam Bobroff <hidden> · 2019-09-17
[PATCH 07/14] powernv/eeh: Use generic code to handle hot resets · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
[PATCH 08/14] pci-hotplug/pnv_php: Add a reset_slot() callback · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
[PATCH 09/14] pci-hotplug/pnv_php: Add support for IODA3 Power9 PHBs · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
[PATCH 10/14] pci-hotplug/pnv_php: Add attention indicator support · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
[PATCH 11/14] powerpc/eeh: Set attention indicator while recovering · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 11/14] powerpc/eeh: Set attention indicator while recovering · Sam Bobroff <hidden> · 2019-09-17
[PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check · Sam Bobroff <hidden> · 2019-09-17
Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
Re: [PATCH 12/14] powerpc/eeh: Add debugfs interface to run an EEH check · "Oliver O'Halloran" <oohall@gmail.com> · 2019-09-17
[PATCH 13/14] powerpc/eeh: Add a eeh_dev_break debugfs interface · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03
Re: [PATCH 13/14] powerpc/eeh: Add a eeh_dev_break debugfs interface · Sam Bobroff <hidden> · 2019-09-17
[PATCH 14/14] selftests/powerpc: Add basic EEH selftest · Oliver O'Halloran <oohall@gmail.com> · 2019-09-03

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2019-09-17 03:40:23

On Tue, Sep 17, 2019 at 1:16 PM Sam Bobroff [off-list ref] wrote:

On Tue, Sep 03, 2019 at 08:16:03PM +1000, Oliver O'Halloran wrote:

quoted

Detecting an frozen EEH PE usually occurs when an MMIO load returns a 0xFFs
response. When performing EEH testing using the EEH error injection feature
available on some platforms there is no simple way to kick-off the kernel's
recovery process since any accesses from userspace (usually /dev/mem) will
bypass the MMIO helpers in the kernel which check if a 0xFF response is due
to an EEH freeze or not.

If a device contains a 0xFF byte in it's config space it's possible to
trigger the recovery process via config space read from userspace, but this
is not a reliable method. If a driver is bound to the device an in use it
will frequently trigger the MMIO check, but this is also inconsistent.

To solve these problems this patch adds a debugfs file called
"eeh_dev_check" which accepts a <domain>:<bus>:<dev>.<fn> string and runs
eeh_dev_check_failure() on it. This is the same check that's done when the
kernel gets a 0xFF result from an config or MMIO read with the added
benifit that it can be reliably triggered from userspace.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>

Looks good, and I tested it with the next patch and it seems to work.

But I think you should make it clear that this does not work with
the hardware "EEH error injection" facility accessible via debugfs in
err_injct (that doesn't seem clear to me from the commit message).

It's not intended to be a separate mechanisms in the long term. I'm
planning on converting this interface to make use the platform defined
error injection mechanism once I can find how to use the PAPR ones
reliably. The idea is to use this as a generic "cause an EEH to happen
on this device" interface for userspace which we can use in test
scripts and the like.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help