Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

From: Timothy Pearson <tpearson@raptorengineering.com>
Date: 2025-06-24 16:34:42
Also in: linux-pci, lkml


----- Original Message -----

From: "Krishna Kumar" <redacted>
To: "Timothy Pearson" <tpearson@raptorengineering.com>
Cc: "linuxppc-dev" <redacted>, "Shawn Anastasio" <redacted>, "linux-kernel"
[off-list ref], "linux-pci" [off-list ref], "Madhavan Srinivasan" [off-list ref],
"Michael Ellerman" [off-list ref], "christophe leroy" [off-list ref], "Naveen N Rao"
[off-list ref], "Bjorn Helgaas" [off-list ref]
Sent: Tuesday, June 24, 2025 2:07:30 AM
Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention indicator

On 6/21/25 8:35 PM, Timothy Pearson wrote:

quoted

----- Original Message -----

quoted

From: "Krishna Kumar" <redacted>
To: "linuxppc-dev" <redacted>, "Timothy Pearson"
[off-list ref], "Shawn
Anastasio" [off-list ref]
Cc: "linuxppc-dev" <redacted>, "linux-kernel"
[off-list ref], "linux-pci"
[off-list ref], "Madhavan Srinivasan" [off-list ref],
"Michael Ellerman" [off-list ref],
"christophe leroy" [off-list ref], "Naveen N Rao"
[off-list ref], "Bjorn Helgaas"
[off-list ref], "Shawn Anastasio" [off-list ref]
Sent: Friday, June 20, 2025 4:26:51 AM
Subject: Re: [PATCH v2 6/6] pci/hotplug/pnv_php: Enable third attention
indicator
Shawn, Timothy:

Thanks for posting the series of patch. Few things I wanted to do better
understand Raptor problem -

1. Last time my two patches solved all the hotunplug operation and Kernel crash
issue except nvme case. It did not work with

    NVME since dpc support was not there. I was not able to do that due to being
      occupied in some other work.

With the current series all hotplug is working correctly, including not only
NVMe on root port and bridge ports, but also suprise plug of the entire PCIe
switch at the root port.  The lack of DPC support *might* be related to the PE
freeze, but in any case we prefer the hotplug driver to be able to recover from
a PE freeze (e.g. if a bridge card is faulty and needs to be replaced) without
also requiring a reboot, so I would consider DPC implementation orthogonal to
this patch set.

Sounds Good !!

quoted

2. I want to understand the delta from last yr problem to this problem. Is the
PHB freeze or hotunplug failure happens

    only for particular Microsemi switch or it happens with all the switches. When
    did this problem started coming ? Till last yr

Hotplug has never worked reliably for us, if it worked at all it was always
rolling the dice on whether the kernel would oops and take down the host.  Even
if the kernel didn't oops, suprise plug and auto-add / auto-remove never worked
beyond one remove operation.

I would like to see this problem may be during our zoom/teams meeting. Though I
have not tested surprise plug/unplug and only tested via sysfs, you may be
correct but I want to have a look of this problem.

quoted

    it was not there. Is it specific to particular Hardware ? Can I get your setup
    to test this problem and your patch ?

Because you will need to be able to physically plug and unplug cards and drives
this may be a bit tricky.  Do you have access to a POWER9 host system with a
x16 PCIe slot?  If so, all you need is a Supermicro SLC-AO3G-8E2P card and some
random U.2 NVMe drives -- these cards are readily available and provide
relatively standardized OCuLink access to a Switchtec bridge.

If you don't have access to a POWER9 host, we can set you up with remote access,
but it won't show all of the crashing and problems that occur with surprise
plug unless we set up a live debug session (video call or similar).


Video Call should be fine. During the call I will have a look of existing
problem and fix along with driver/kernel logs.

Sounds good.  We'll set up a machine in the DMZ for this session so you can also have access.  For anyone interested in logging on to the box for logs, can you send over an SSH public key to my Email address directly?  Will get everyone added with root access to the test box prior to the call start.

quoted

3. To me, hot unplug opertaion  --> AER triggering --> DPC support, this flow
should mask the error to reach root port/cpu and it

    should solve the PHB freeze/ hot unplug failure operation. If there are AER/EEH
    related synchronization issue we need to solve them.

    Can you pls list the issue, I will pass it to EEH/AER team. But yeah, to me if
    AER implementation is correct and we add DPC support,

    all the error will be contained by switch itself. The PHB/root port/cpu will not
    be impacted by this and there should not be any freeze.

While this is a good goal to work toward, it only solves one possible fault
mode.  The patch series posted here will handle the general case of a PE freeze
without requiring a host reboot, which is great for high-reliability systems
where there might be a desire to replace the entire switch card (this has been
tested with the patch series and works perfectly).


You may be correct on this and this is possible. If the driver and AER/EEH
errors/events are properly

handled then we may not need DPC in all cases. The point of DPC was to absorb
the error at switch port

itself so that it will not reach to PHB/Root-port/Cpu and will avoid further
corruption. I was hoping that if

DPC gets enabled, we may not need explicit reboot for drives to come up in case
of surprise hot unplug.

I do understand the logic here, and it would theoretically work, but again it's a bit more fragile than the solution we're presenting here in that it relies on another chunk of device logic to work correctly in all cases, with the consequence of a failure being a forced reboot.

With our patch series here we can hot plug and hot unplug NVMe drives all day without requiring any reboots, including surprise plug.  DPC would simply make this process a little bit faster, in that we don't have to wait a few hundred milliseconds for the PE to unfreeze and the EEH driver to give up.

But yeah, we can compare this with current result when this support will be
enabled.

quoted

4. Ofcourse we can pick some of the fixes from pciehp driver if its missing in
pnv_php.c. Also at the same time you have done

    some cleanup in hot unplug path and fixed the attenuation button related code.
    If these works fine, we can pick it. But I want to test it.

     Pls provide me setup.

5. If point 3 and 4 does not solve the problem, then only we should move to
pciehp.c. But AFAIK, PPC/Powernv is DT based while pciehp.c

     may be only supporting acpi (I have to check it on this).  We need to provide
     PHB related information via DTB and maintain the related

     topology information via dtb and then it can be doable. Also , we need to do
     thorough planning/testing if we think to choose pciehp.c.

     But yeah, lets not jump here and lets try to fix the current issues via point 3
     & 4. Point 5 will be our last option.

If possible I would like to see this series merged vs. being blocked on DPC.
Again, from where I sit DPC is orthogonal; many events can cause a PE freeze
and implementing DPC only solves one.  We do *not* want to require a host
reboot in any situation whatsoever short of a complete failure of a critical
element (e.g. the PHB itself or a CPU package); our use case as deployed is
five nines critical infrastructure, and the broken hotplug has already been the
sole reason we have not maintained 100% uptime on a key system.

If you are in hurry and want to defer DPC for some time, I am fine with it since
it serves larger purpose like PE freeze and NVME drives working

along with surprise hotplug fixes.  I have gone through your pnv_php.c changes
and I am mostly fine with it. But, I would like to review it again

from larger prespective w.r.t to EEH & pciehp.c, so give me some time.

Sure, please let me know if anything is concerning.  My goal would be to have this reviewed from a code quality perspective and a v2 posted at least a couple of days before the video call.

Also, if

possible you can show me

the problem/fix along with log  during video call. it would be great if we can
meet sometimes next month in early first week may be on 5th of July.

Let's plan for that -- this is somewhat urgent with Debian Trixie being released soon, and I want to see this repair backported.  We already ship Bookworm Debian kernels with completely broken VFIO and hotplug, our goal is to get that all fixed for Trixie and enable the functionality our customers are paying for.

I will request few of the EEH/AER developer to have a look into the patch and to
join the meeting if they have bandwidth. Please shoot the

mail/invite on krishna.kumar11@ibm.com along with this email id. I am based in
Bangalore but can be available till night 10:00 pm.

Will do, thanks!

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help