Thread (31 messages) 31 messages, 4 authors, 2025-07-15

Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

From: Krishna Kumar <hidden>
Date: 2025-07-07 08:01:32
Also in: linux-pci, lkml

Thanks all for the review and Thanks a bunch to Timothy for fixing the PE Freeze issue. The hotplug issues are like you fix N issue and N+1 th issue will come with new HW.

We had a meeting of around 1.5 -2.0 hr with demo, code review and log review and we decided to let these fixes go ahead. I am consolidating what we discussed -


1. Let these fixes go ahead as it solves wider and much needed customer issue and its urgently needed for time being. We have verified in live demo that nothing is broken from legacy flow and 

PE/PHB freeze, race condition, hung, oops etc has been solved correctly. Basically it fixes below issues -

root-port(phb) -> switch(having4 port)--> 2 nvme devices

1st case - only removal of EP-nvme device (surprise hotplug disables msi at root port), soft removal may work
2nd case  - If we remove switch itself (surprise hotplug disable msi at root port) .


Some explanation of problem -

PHB Freeze (Interrupt is not reaching there) - Fence - Need to reset ||
EP device on removal generated msi - goes to cpu (via root port and then apic mmio region to cpu ),
PHB/root port itself got freeze and cpu never get interrupt - No wq/event going to work - driver is noit working


One area what I thought to fix it with OPAL call is below piece of code-

ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+    if (ret) {
+        SLOT_WARN(php_slot, "PCI slot activation failed with error code %d, possible frozen PHB", ret);
+        SLOT_WARN(php_slot, "Attempting complete PHB reset before retrying slot activation\n");
+        for (i = 0; i < 3; i++) {
+            /* Slot activation failed, PHB may be fenced from a prior device failure
+             * Use the OPAL fundamental reset call to both try a device reset and clear
+             * any potentially active PHB fence / freeze
+             */
+            SLOT_WARN(php_slot, "Try %d...\n", i + 1);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_warm_reset);
+            msleep(250);
+            pci_set_pcie_reset_state(php_slot->pdev, pcie_deassert_reset);
+
+            ret = pnv_php_set_slot_power_state(slot, OPAL_PCI_SLOT_POWER_ON);
+            if (!ret)
+                break;
+        }


I would like to see this fix in skiboot, the opal call should reset it and it should work. Normally opal call is responsible for  link training and reset, so ideally it should  happens from there. As if now, Timothy has some explanation for it, so its fine for now. He can add his points for record.


2. In future we have decided to work on items like - removal of Work-queue with threaded irq, addition of dpc support in this driver.

3. We have also discussed if we want to move pciehpc.c driver in future, we have to keep below things in mind, Timothy can add some more points for record.

Device Node (DTB) & its relationship with slot, Can we decouple it and will pciehpc.c going to work correctly for this ?
Driver binding and unbinding based on hotplug event and its relationship with slot. Role of DTB in this. Our driver  also depends on OPAL call for link reset etc, can we decouple from it ? If we add some PPC specific code in pciehpc.c, how will it gets handled (by VFT/Function-Pointer or #ifdef or by seperate files ?)


Lukas has some points for above and I am somewhat aligned with below points, but it needs to be tested to see conceptually it fixes above issues, I am consolidating his points and he can add more  if needed-

Only the host bridge
has to be enumerated in the devicetree or DSDT.

pnv_php.c seems to search the devicetree for hotplug slots and
instantiates them.

We've traditionally dealt with such issues by inserting pcibios_*()
hooks in generic code, with a __weak implementation (which is usually
an empty stub) and a custom implementation in arch/powerpc/.

The linker then chooses the custom implementation over the __weak one.

You can find the existing hooks with:

git grep "__weak .*pcibios" -- drivers/pci
git grep pcibios -- arch/powerpc

An alternative method is to add a callback to struct pci_host_bridge.

pciehp is used on all kinds of arches, it's just an implementation of the PCIe Base Spec.



Best Regards,

Krishna




On 6/25/25 4:04 AM, Bjorn Helgaas wrote:
On Wed, Jun 18, 2025 at 07:37:54PM -0500, Timothy Pearson wrote:
quoted
----- Original Message -----
quoted
From: "Bjorn Helgaas" <helgaas@kernel.org>
To: "Timothy Pearson" <tpearson@raptorengineering.com>
Cc: "linuxppc-dev" <redacted>, "linux-kernel" <redacted>, "linux-pci"
[off-list ref], "Madhavan Srinivasan" [off-list ref], "Michael Ellerman" [off-list ref],
"christophe leroy" [off-list ref], "Naveen N Rao" [off-list ref], "Bjorn Helgaas"
[off-list ref], "Shawn Anastasio" [off-list ref]
Sent: Wednesday, June 18, 2025 2:01:46 PM
Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator
On Wed, Jun 18, 2025 at 11:58:59AM -0500, Timothy Pearson wrote:
quoted
 state
Weird wrapping of last word of subject to here.
I'll need to see what's up with my git format-patch setup. Apologies
for that across the multiple series.
No worries.  If you can figure out how to make your mailer use the
normal "On xxx, somebody wrote:" attribution instead of duplicating
all those headers, that would be far more useful :)
quoted
quoted
quoted
+static int pnv_php_get_raw_indicator_status(struct hotplug_slot *slot, u8
*state)
+{
+	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
+	struct pci_dev *bridge = php_slot->pdev;
+	u16 status;
+
+	pcie_capability_read_word(bridge, PCI_EXP_SLTCTL, &status);
+	*state = (status & (PCI_EXP_SLTCTL_AIC | PCI_EXP_SLTCTL_PIC)) >> 6;
Should be able to do this with FIELD_GET().
I used the same overall structure as the pciehp_hpc driver here.  Do
you want me to also fix up that driver with FIELD_GET()?
Nope, I think it's fine to keep this looking like pciehp for now.
If somebody wants to use FIELD_GET() in pciehp, I'd probably be OK
with that, but no need for you to open that can of worms.
quoted
quoted
Is the PCI_EXP_SLTCTL_PIC part needed?  It wasn't there before, commit
log doesn't mention it, and as far as I can tell, this would be the
only driver to do that.  Most expose only the attention status (0=off,
1=on, 2=identify/blink).
quoted
+	return 0;
+}
+
+
 static int pnv_php_get_attention_state(struct hotplug_slot *slot, u8 *state)
 {
 	struct pnv_php_slot *php_slot = to_pnv_php_slot(slot);
 
+	pnv_php_get_raw_indicator_status(slot, &php_slot->attention_state);
This is a change worth noting.  Previously we didn't read the AIC
state from PCI_EXP_SLTCTL at all; we used php_slot->attention_state to
keep track of whatever had been previously set via
pnv_php_set_attention_state().

Now we read the current state from PCI_EXP_SLTCTL.  It's not clear
that php_slot->attention_state is still needed at all.
It probably isn't.  It's unclear why IBM took this path at all,
given pciehp's attention handlers predate pnv-php's by many years.
quoted
Previously, the user could write any value at all to the sysfs
"attention" file and then read that same value back.  After this
patch, the user can still write anything, but reads will only return
values with PCI_EXP_SLTCTL_AIC and PCI_EXP_SLTCTL_PIC.
quoted
 	*state = php_slot->attention_state;
 	return 0;
 }
@@ -461,7 +474,7 @@ static int pnv_php_set_attention_state(struct hotplug_slot
*slot, u8 state)
 	mask = PCI_EXP_SLTCTL_AIC;
 
 	if (state)
-		new = PCI_EXP_SLTCTL_ATTN_IND_ON;
+		new = FIELD_PREP(PCI_EXP_SLTCTL_AIC, state);
This changes the behavior in some cases:

 write 0: previously turned indicator off, now writes reserved value
 write 2: previously turned indicator on, now sets to blink
 write 3: previously turned indicator on, now turns it off
If we're looking at normalizing with pciehp with an eye toward
eventually deprecating / removing pnv-php, I can't think of a better
time to change this behavior.  I suspect we're the only major user
of this code path at the moment, with most software expecting to see
pciehp-style handling.  Thoughts?
I'm OK with changing this, but I do think it would be worth calling
out the different behavior in the commit log.

Bjorn
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help