Thread (7 messages) 7 messages, 4 authors, 2020-06-19

Re: powerpc/pci: [PATCH 1/1 V3] PCIE PHB reset

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2020-06-19 06:11:21

On Wed, Jun 17, 2020 at 4:29 PM Michael Ellerman [off-list ref] wrote:
"Oliver O'Halloran" [off-list ref] writes:
quoted
On Tue, Jun 16, 2020 at 9:55 PM Michael Ellerman [off-list ref] wrote:
quoted
wenxiong@linux.vnet.ibm.com writes:
quoted
From: Wen Xiong <redacted>

Several device drivers hit EEH(Extended Error handling) when triggering
kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
in pci general code when triggering kdump.
Actually it's in pseries specific PCI code, and the reset is done in the
2nd kernel as it boots, not when triggering the kdump.

You're doing it as a:

  machine_postcore_initcall(pseries, pseries_phb_reset);

But we do the EEH initialisation in:

  core_initcall_sync(eeh_init);

Which happens first.

So it seems to me that this should be called from pseries_eeh_init().
This happens to use some of the same RTAS calls as EEH, but it's
entirely orthogonal to it.
I don't agree. I mean it's literally calling EEH_RESET_FUNDAMENTAL etc.
Those RTAS calls are all documented in the EEH section of PAPR.

I guess you're saying it's orthogonal to the kernel handling an EEH and
doing the recovery process etc, which I can kind of see.
quoted
Wedging the two together doesn't make any real sense IMO since this
should be usable even with !CONFIG_EEH.
You can't turn CONFIG_EEH off for pseries or powernv.
Not yet :)
And if you could this patch wouldn't compile because it uses EEH
constants that are behind #ifdef CONFIG_EEH.
That's fixable.
If you could turn CONFIG_EEH off it would presumably be because you were
on a platform that didn't support EEH, in which case you wouldn't need
this code.
I think there's an argument to be made for disabling EEH in some
situations. A lot of drivers do a pretty poor job of recovering in the
first place so it's conceivable that someone might want to disable it
in say, a kdump kernel. That said, the real reason is mostly for the
sake of code organisation. EEH is an optional platform feature but you
wouldn't know it looking at the implementation and I'd like to stop it
bleeding into odd places. Making it buildable without !CONFIG_EEH
would probably help.
So IMO this is EEH code, and should be with the other EEH code and
should be behind CONFIG_EEH.
*shrug*

I wanted it to follow the model of the powernv implementation of the
same feature which is done immediately after initialising the
pci_controller and independent of all of the EEH setup. Although,
looking at it again I see it calls pnv_eeh_phb_reset() which is in
eeh_powernv.c so I guess that's pretty similar to what you're
suggesting.
That sounds like a good cleanup. I'm not concerned about conflicts
within arch/powerpc, I can fix them up.
quoted
quoted
quoted
+             list_for_each_entry(phb, &hose_list, list_node) {
+                     config_addr = pseries_get_pdn_addr(phb);
+                     if (config_addr == -1)
+                             continue;
+
+                     ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
+                             config_addr, BUID_HI(phb->buid),
+                             BUID_LO(phb->buid), EEH_RESET_FUNDAMENTAL);
+
+                     /* If fundamental-reset not supported, try hot-reset */
+                     if (ret == -8)
Where does -8 come from?
There's a comment right there.
Yeah I guess. I was expecting it would map to some RTAS_ERROR_FOO value,
but it's just literally -8 in PAPR.
Yeah, as far as I can tell the meaning of the return codes are
specific to each RTAS call, it's a bit bad.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help