Thread (25 messages) 25 messages, 6 authors, 2018-12-27

Re: [PATCH v2] PCI/MSI: Don't touch MSI bits when the PCI device is disconnected

From: Oliver O'Halloran <oohall@gmail.com>
Date: 2018-11-12 05:50:09
Also in: linux-pci, lkml

Possibly related (same subject, not in this thread)

On Thu, 2018-11-08 at 23:06 +0000, Alex_Gagniuc@Dellteam.com wrote:
On 11/08/2018 04:51 PM, Greg KH wrote:
quoted
On Thu, Nov 08, 2018 at 10:49:08PM +0000, Alex_Gagniuc@Dellteam.com wrote:
quoted
In the case that we're trying to fix, this code executing is a result of
the device being gone, so we can guarantee race-free operation. I agree
that there is a race, in the general case. As far as checking the result
for all F's, that's not an option when firmware crashes the system as a
result of the mmio read/write. It's never pretty when firmware gets
involved.
If you have firmware that crashes the system when you try to read from a
PCI device that was hot-removed, that is broken firmware and needs to be
fixed.  The kernel can not work around that as again, you will never win
that race.
But it's not the firmware that crashes. It's linux as a result of a 
fatal error message from the firmware. And we can't fix that because FFS 
handling requires that the system reboots [1].
Do we know the exact circumsances that result in firmware requesting a
reboot? If it happen on any PCIe error I don't see what we can do to
prevent that beyond masking UEs entirely (are we even allowed to do
that on FFS systems?).
If we're going to say that we don't want to support FFS because it's a 
separate code path, and different flow, that's fine. I am myself, not a 
fan of FFS. But if we're going to continue supporting it, I think we'll 
continue to have to resolve these sort of unintended consequences.

Alex

[1] ACPI 6.2, 18.1 - Hardware Errors and Error Sources
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help