Thread (15 messages) 15 messages, 4 authors, 2024-02-08

Re: [PATCH v2 1/4] PCI/AER: Store more information in aer_err_info

From: Wang, Qingshun <hidden>
Date: 2024-01-31 08:04:42
Also in: linux-acpi, linux-cxl, linux-edac, linux-pci, lkml

On Tue, Jan 30, 2024 at 06:26:39PM -0800, Kuppuswamy Sathyanarayanan wrote:
On 1/24/24 10:27 PM, Wang, Qingshun wrote:
quoted
When Advisory Non-Fatal errors are raised, both correctable and
Maybe you can start with same info about what Advisory Non-FataL
errors are and the specification reference. I know that you included
it in cover letter. But it is good to include it in commit log.
Good idea, thanks!
quoted
uncorrectable error statuses will be set. The current kernel code cannot
store both statuses at the same time, thus failing to handle ANFE properly.
In addition, to avoid clearing UEs that are not ANFE by accident, UE
Please add some details about the impact of not clearing them.
Makes sense, will do.
quoted
severity and Device Status also need to be recorded: any fatal UE cannot
be ANFE, and if Fatal/Non-Fatal Error Detected is set in Device Status, do
not take any assumption and let UE handler to clear UE status.

Store status and mask of both correctable and uncorrectable errors in
aer_err_info. The severity of UEs and the values of the Device Status
register are also recorded, which will be used to determine UEs that should
be handled by the ANFE handler. Refactor the rest of the code to use
cor/uncor_status and cor/uncor_mask fields instead of status and mask
fields.

Signed-off-by: "Wang, Qingshun" <redacted>
---
 drivers/acpi/apei/ghes.c | 10 ++++-
 drivers/cxl/core/pci.c   |  6 ++-
 drivers/pci/pci.h        |  8 +++-
 drivers/pci/pcie/aer.c   | 93 ++++++++++++++++++++++++++--------------
 include/linux/aer.h      |  4 +-
 include/linux/pci.h      | 27 ++++++++++++
 6 files changed, 111 insertions(+), 37 deletions(-)

......
@@ -1213,38 +1233,49 @@ int aer_get_device_error_info(struct pci_dev *dev, struct aer_err_info *info)
 	int temp;
 
 	/* Must reset in this function */
-	info->status = 0;
+	info->cor_status = 0;
+	info->uncor_status = 0;
+	info->uncor_severity = 0;
 	info->tlp_header_valid = 0;
 
 	/* The device might not support AER */
 	if (!aer)
 		return 0;
 
-	if (info->severity == AER_CORRECTABLE) {
+	if (info->severity == AER_CORRECTABLE ||
+	    info->severity == AER_NONFATAL ||
+	    type == PCI_EXP_TYPE_ROOT_PORT ||
+	    type == PCI_EXP_TYPE_RC_EC ||
+	    type == PCI_EXP_TYPE_DOWNSTREAM) {

It looks like you are reading both uncorrectable and correctable status
by default for both NONFATAL and CORRECTABLE errors. Why not do
it conditionally only for ANFE errors?
My initial purpose was the value will be used in aer_event trace in
PATCH 4 under both conditions, but I can also add checks here to reduce
unnecessary IO and remove the checks in PATCH 4.
quoted
+		/* Link is healthy for IO reads */
 		pci_read_config_dword(dev, aer + PCI_ERR_COR_STATUS,
-			&info->status);
+				      &info->cor_status);
 		pci_read_config_dword(dev, aer + PCI_ERR_COR_MASK,
-			&info->mask);
 
......
diff --git a/include/linux/pci.h b/include/linux/pci.h
index add9368e6314..259812620d4d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -318,6 +318,33 @@ struct pci_sriov;
 struct pci_p2pdma;
 struct rcec_ea;
 
+struct pcie_capability_regs {
+	u8 pcie_cap_id;
+	u8 next_cap_ptr;
+	u16 pcie_caps;
+	u32 device_caps;
+	u16 device_control;
+	u16 device_status;
+	u32 link_caps;
+	u16 link_control;
+	u16 link_status;
+	u32 slot_caps;
+	u16 slot_control;
+	u16 slot_status;
+	u16 root_control;
+	u16 root_caps;
+	u32 root_status;
+	u32 device_caps_2;
+	u16 device_control_2;
+	u16 device_status_2;
+	u32 link_caps_2;
+	u16 link_control_2;
+	u16 link_status_2;
+	u32 slot_caps_2;
+	u16 slot_control_2;
+	u16 slot_status_2;
+};
+
IIUC, this struct is only used drivers/acpi/apei/ghes.c . Why not define it in that file?
You are right. Whenever we need it elsewhere, we can move it to the
header file anyway.
quoted
 /* The pci_dev structure describes PCI devices */
 struct pci_dev {
 	struct list_head bus_list;	/* Node in per-bus list */
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
--
Best regards,
Wang, Qingshun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help