Thread (17 messages) 17 messages, 5 authors, 2021-09-08

Re: [PATCH 0/5] s390/pci: automatic error recovery

From: Niklas Schnelle <schnelle@linux.ibm.com>
Date: 2021-09-07 07:49:30
Also in: linux-s390, lkml

On Mon, 2021-09-06 at 21:05 -0500, Linas Vepstas wrote:
On Mon, Sep 6, 2021 at 4:49 AM Niklas Schnelle [off-list ref]
wrote:
quoted
 I believe we might be the first
implementation of PCI device recovery in a virtualized setting requiring
us to
coordinate the device reset with the hypervisor platform by issuing a
disable
and re-enable to the platform as well as starting the recovery following
a platform event.
I recall none of the details, but SRIOV is a standardized system for
sharing a PCI device across multiple virtual machines. It has detailed info
on what the hypervisor must do, and what the local OS instance must do to
accomplish this.  
Yes and in fact on s390 we make heavy use of SR-IOV.
It's part of the PCI standard, and its more than a decade
old now, maybe two. Being a part of the PCI standard, it was interoperable
with error recovery, to the best of my recollection. 
Maybe I worded things with a bit too much sensationalism and it might
even be that POWER supports error recovery also with virtualization,
though I'm not sure how far that goes.

I believe you are right in that SR-IOV supports the error recovery,
after all this patch set also has to work together with SRIOV enabled
devices. At least on s390 though until this patch set the error
recovery performed by the hypervisor stopped in the hypervisor.

The missing part added by this patch set is coordinating with device
drivers in Linux to determine where use of a recovered device can pick
up after the PCIe level error recovery is done.

As for virtualization this coordination of course needs to cross the
hypervisor/guest boundary and at least for KVM+QEMU I know for a fact
that reporting a PCI error to the guest is currently just a stub that
actually completely stops the guest, so you definitely don't get smooth
error recovery there yet.
At the time it was
introduced, it got pushed very aggressively.  The x86 hypervisor vendors
were aiming at the heart of zseries, and were militant about it.
And yet we're still here, use SR-IOV ourselves and even support Linux +
KVM as a hypervisor you can use just the same on a mainframe, an x86,
POWER, or ARM system.
-- Linas
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help