Thread (34 messages) 34 messages, 4 authors, 2019-09-19

Re: [PATCH 07/14] powernv/eeh: Use generic code to handle hot resets

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2019-09-17 07:32:55

On Tue, Sep 17, 2019 at 11:15 AM Sam Bobroff [off-list ref] wrote:
On Tue, Sep 03, 2019 at 08:15:58PM +1000, Oliver O'Halloran wrote:
quoted
When we reset PCI devices managed by a hotplug driver the reset may
generate spurious hotplug events that cause the PCI device we're resetting
to be torn down accidently. This is a problem for EEH (when the driver is
EEH aware) since we want to leave the OS PCI device state intact so that
the device can be re-set without losing any resources (network, disks,
etc) provided by the driver.

Generic PCI code provides the pci_bus_error_reset() function to handle
resetting a PCI Device (or bus) by using the reset method provided by the
hotplug slot driver. We can use this function if the EEH core has
requested a hot reset (common case) without tripping over the hotplug
driver.
Could you explain a bit more about how this change solves the problem?
Is it that the hotplug driver's reset method doesn't cause spurious
hotplug events?
Yes, see the comment below.
quoted
-     if (pci_is_root_bus(bus) ||
-         pci_is_root_bus(bus->parent))
+     if (pci_is_root_bus(bus))
              return pnv_eeh_root_reset(hose, option);

+     /*
+      * For hot resets try use the generic PCI error recovery reset
+      * functions. These correctly handles the case where the secondary
+      * bus is behind a hotplug slot and it will use the slot provided
+      * reset methods to prevent spurious hotplug events during the reset.
+      *
+      * Fundemental resets need to be handled internally to EEH since the
+      * PCI core doesn't really have a concept of a fundemental reset,
+      * mainly because there's no standard way to generate one. Only a
+      * few devices require an FRESET so it should be fine.
+      */
+     if (option != EEH_RESET_FUNDAMENTAL) {
+             /*
+              * NB: Skiboot and pnv_eeh_bridge_reset() also no-op the
+              *     de-assert step. It's like the OPAL reset API was
+              *     poorly designed or something...
+              */
+             if (option == EEH_RESET_DEACTIVATE)
+                     return 0;
It looks like this will prevent pnv_eeh_root_reset(bus->parent) (below)
from being called for EEH_RESET_DEACTIVATE, when it was before the
patch. Is that right?
I agree it's a little awkward, but it works fine. OPAL has always
treated the resets defined by opal_pci_reset() as being edge-triggered
rather than level triggered since the de-assert step has always been
implemented as a no-op. This behaviour is effectively part of the ABI
between OPAL and the kernel since the kernel skips the de-assert step
in pnv_eeh_bridge_reset(). Although pnv_eeh_reset() uses
pnv_eeh_reset_root() to reset the secondary bus of the root port
pnv_pci_reset_secondary_bus() still uses the bridge reset.

I should probably update the OPAL API docs to mention that. Oh well.
quoted
+             rc = pci_bus_error_reset(bus->self);
+             if (!rc)
+                     return 0;
Is it correct to fall through and try a different reset if this fails?
The only reason I can see for the generic code failing is when config
space to the bridge is blocked by the EEH core. The internal
pnv_eeh_bridge_reset() function has the option of calling
opal_pci_reset() or using the internal EEH config accessors (which
aren't filtered) so falling back makes sense to me.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help