Thread (13 messages) 13 messages, 4 authors, 2023-09-27

Re: Questions: Should kernel panic when PCIe fatal error occurs?

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2023-09-25 03:54:24
Also in: linux-acpi, linux-pci, lkml

On Fri, Sep 22, 2023 at 8:23 AM David Laight [off-list ref] wrote:
quoted
It would be nice if they worked the same, but I suspect that vendors
may rely on the fact that CPER_SEV_FATAL forces a restart/panic as
part of their system integrity story.
The file system errors created by a panic (especially an NMI panic)
could easily be more problematic than a failed PCIe data transfer.
Evan a read that returned ~0u - which can be checked for.

Panicking a system that is converting TDM telephony to RTP for the
911 emergency service because a PCIe cable/riser connecting one of the
TDM board has become loose doesn't seem ideal.
For kernel native AER the default reaction to errors is
reset-and-reinit which probably isn't much better for your case.
Sounds like you would want a knob to suppress everything except error
reporting so you can handle it in userspace?
(Or because the TDM board's fpga has decided it isn't going to respond
to any accesses until the BARs are setup again...)

The system can carry on with some TDM connections disabled - but that
is ok because they are all duplicated in case a cable gets cuit.
Well that's a relief :)

Oliver
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help