Thread (3 messages) 3 messages, 2 authors, 2025-11-20

Re: [PATCH 1/1] powerpc/eeh: fix recursive pci_lock_rescan_remove locking in EEH event handling

From: Mahesh J Salgaonkar <mahesh@linux.ibm.com>
Date: 2025-11-18 08:56:51
Also in: lkml

On 2025-11-05 00:40:52 Wed, Narayana Murty N wrote:
The recent commit 1010b4c012b0 ("powerpc/eeh: Make EEH driver device
hotplug safe") restructured the EEH driver to improve synchronization
with the PCI hotplug layer.

However, it inadvertently moved pci_lock_rescan_remove() outside its
intended scope in eeh_handle_normal_event(), leading to broken PCI
error reporting and improper EEH event triggering. Specifically,
eeh_handle_normal_event() acquired pci_lock_rescan_remove() before
calling eeh_pe_bus_get(), but eeh_pe_bus_get() itself attempts to
acquire the same lock internally, causing nested locking and disrupting
normal EEH event handling paths.

This patch adds a boolean parameter do_lock to _eeh_pe_bus_get(),
with two public wrappers:
    eeh_pe_bus_get() with locking enabled.
    eeh_pe_bus_get_nolock() that skips locking.

Callers that already hold pci_lock_rescan_remove() now use
eeh_pe_bus_get_nolock() to avoid recursive lock acquisition.

Additionally, pci_lock_rescan_remove() calls are restored to the correct
position—after eeh_pe_bus_get() and immediately before iterating affected
PEs and devices. This ensures EEH-triggered PCI removes occur under proper
bus rescan locking without recursive lock contention.
[...]
quoted hunk ↗ jump to hunk
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index ef78ff77cf8f..492fae5e3823 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -812,6 +812,35 @@ static void eeh_clear_slot_attention(struct pci_dev *pdev)
 	ops->set_attention_status(slot->hotplug, 0);
 }
 
+static const char *eeh_pe_get_loc(struct eeh_pe *pe)
+{
So it is duplicate of eeh_pe_loc_get() with nolock variant. Instead, can
we split original function eeh_pe_loc_get() ? Move the while (bus) loop
into another function with name eeh_bus_loc_get(bus) which will be
nolock variant and use that here ?
quoted hunk ↗ jump to hunk
+	struct pci_bus *bus = eeh_pe_bus_get_nolock(pe);
+	struct device_node *dn;
+	const char *location = NULL;
+
+	while (bus) {
+		dn = pci_bus_to_OF_node(bus);
+		if (!dn) {
+			bus = bus->parent;
+			continue;
+		}
+
+		if (pci_is_root_bus(bus))
+			location = of_get_property(dn, "ibm,io-base-loc-code",
+						   NULL);
+		else
+			location = of_get_property(dn, "ibm,slot-location-code",
+						   NULL);
+
+		if (location)
+			return location;
+
+		bus = bus->parent;
+	}
+
+	return "N/A";
+}
+
 /**
  * eeh_handle_normal_event - Handle EEH events on a specific PE
  * @pe: EEH PE - which should not be used after we return, as it may
@@ -846,7 +875,7 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 
 	pci_lock_rescan_remove();
 
-	bus = eeh_pe_bus_get(pe);
+	bus = eeh_pe_bus_get_nolock(pe);
 	if (!bus) {
 		pr_err("%s: Cannot find PCI bus for PHB#%x-PE#%x\n",
 			__func__, pe->phb->global_number, pe->addr);
@@ -886,14 +915,14 @@ void eeh_handle_normal_event(struct eeh_pe *pe)
 	/* Log the event */
 	if (pe->type & EEH_PE_PHB) {
 		pr_err("EEH: Recovering PHB#%x, location: %s\n",
-			pe->phb->global_number, eeh_pe_loc_get(pe));
+			pe->phb->global_number, eeh_pe_get_loc(pe));
 	} else {
 		struct eeh_pe *phb_pe = eeh_phb_pe_get(pe->phb);
 
 		pr_err("EEH: Recovering PHB#%x-PE#%x\n",
 		       pe->phb->global_number, pe->addr);
 		pr_err("EEH: PE location: %s, PHB location: %s\n",
-		       eeh_pe_loc_get(pe), eeh_pe_loc_get(phb_pe));
+		       eeh_pe_get_loc(pe), eeh_pe_get_loc(phb_pe));
 	}
 
Thanks,
-Mahesh.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help