Thread (12 messages) 12 messages, 3 authors, 2024-12-11

Re: [PATCH 2/2] ACPI: extlog: Trace CPER PCI Express Error Section

From: Dan Williams <hidden>
Date: 2024-12-11 01:52:08
Also in: linux-acpi, linux-pci, lkml

Fabio M. De Francesco wrote:
On Tuesday, August 6, 2024 9:56:24 PM GMT+2 Dan Williams wrote:
quoted
Fabio M. De Francesco wrote:
quoted
Currently, extlog_print() (ELOG) only reports CPER PCIe section (UEFI
v2.10, Appendix N.2.7) to the kernel log via print_extlog_rcd().
I think the critical detail is is that print_extlog_rcd() is only
triggered when ras_userspace_consumers() returns true. The observation
is that ras_userspace_consumers() hides information from the trace path
when the intended purpose of it was to hide duplicate emissions to the
kernel log when userspace is watching the tracepoints.

Setting aside whether ras_userspace_consumers() is still a good idea or
not, it is obvious that this patch as is may surprise environments that
start seeing kernel error logs where the kernel was silent before.

I think the path of least surprise would be to make sure that
pci_print_aer() optionally skips emitting to the kernel log when not
needed wanted.
Sorry for replying so late...

I'm not entirely sure that users would not prefer to be surprised by 
_finally_ seeing kernel error logs for failing PCIe components. I suspect 
that users might have been confused by not seeing any output.
2 notes:

* New KERN_ERR prints are often found to be unwelcome. When the kernel starts
  printing new error messages it causes sysadmins to scramble.

* The future of RAS is trace-events. Any new RAS messages to the kernel
  log need to ask the question, "is userspace better served by
  registering for a RAS trace event, rather than parsing kernel log
  messsages".

[..]
I need to be sure that I understood...

void pci_print_aer(char *level, struct pci_dev *dev, int aer_severity,
                   struct aer_capability_regs *aer)
{
        [...]

        if (printk_get_level(level) <= console_loglevel) {
                pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n",
                        status, mask);
No, the code would be:

    pci_printk(level, dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);

...i.e. just pass @level rather than open code "if
(printk_get_level(level) <= console_loglevel)".
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help