Re: [PATCH v5 2/2] PCI/AER: Print UNCOR_STATUS bits that might be ANFE
From: Bjorn Helgaas <helgaas@kernel.org>
Date: 2025-08-29 21:18:03
Also in:
linux-acpi, linux-cxl, linux-edac, linux-pci, lkml
[+cc Matt] On Thu, Jun 20, 2024 at 10:58:57AM +0800, Zhenzhong Duan wrote:
quoted hunk ↗ jump to hunk
When an Advisory Non-Fatal error(ANFE) triggers, both correctable error(CE) status and ANFE related uncorrectable error(UE) status will be printed: AER: Correctable error message received from 0000:b7:02.0 PCIe Bus Error: severity=Correctable, type=Transaction Layer, (Receiver ID) device [8086:0db0] error status/mask=00002000/00000000 [13] NonFatalErr Uncorrectable errors that may cause Advisory Non-Fatal: [12] TLP Tested-by: Yudong Wang <redacted> Co-developed-by: "Wang, Qingshun" <redacted> Signed-off-by: "Wang, Qingshun" <redacted> Signed-off-by: Zhenzhong Duan <redacted> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> --- drivers/pci/pcie/aer.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 3dcfa0191169..ba3a54092f2c 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c@@ -681,6 +681,7 @@ static void __aer_print_error(struct pci_dev *dev, { const char **strings; unsigned long status = info->status & ~info->mask; + unsigned long anfe_status = info->anfe_status; const char *level, *errmsg; int i;@@ -701,6 +702,20 @@ static void __aer_print_error(struct pci_dev *dev, info->first_error == i ? " (First)" : ""); } pci_dev_aer_stats_incr(dev, info); + + if (!anfe_status) + return;
__aer_print_error() is used by both native AER handling, where Linux fields the AER interrupt and reads the AER status registers directly, and APEI GHES firmware-first error handling, where platform firmware fields the AER interrupt, reads the AER status registers, and packages them up to hand off to Linux via aer_recover_queue(). But the previous patch only sets info->anfe_status for the native path, so the APEI GHES path doesn't get the benefit of this change. I think both paths should log the same ANFE information.
+
+ strings = aer_uncorrectable_error_string;
+ pci_printk(level, dev, "Uncorrectable errors that may cause Advisory Non-Fatal:\n");
+
+ for_each_set_bit(i, &anfe_status, 32) {
+ errmsg = strings[i];
+ if (!errmsg)
+ errmsg = "Unknown Error Bit";
+
+ pci_printk(level, dev, " [%2d] %s\n", i, errmsg);
+ }
}
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info)
--
2.34.1