Thread (10 messages) 10 messages, 3 authors, 2024-06-25

Re: [PATCH v2 2/2] powerpc: hotplug driver bridge support

From: "Oliver O'Halloran" <oohall@gmail.com>
Date: 2024-05-20 14:59:33
Also in: linux-pci, lkml

On Fri, May 17, 2024 at 9:15 PM krishna kumar [off-list ref] wrote:
quoted
Uh, if I'm reading this right it looks like your "slot" C5 is actually
the PCIe switch's internal bus which is definitely not hot pluggable.
It's a hotplug slot. Please see the snippet below:

:~$ sudo lspci -vvv -s 0004:02:00.0 | grep --color HotPlug
         SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:01.0 | grep --color HotPlug
        SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:02.0 | grep --color HotPlug
        SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$

:~$ sudo lspci -vvv -s 0004:02:03.0 | grep --color HotPlug
        SltCap:    AttnBtn- PwrCtrl+ MRL- AttnInd- PwrInd- HotPlug+ Surprise-
:~$
All this is showing is that the switch downstream ports on bus 0004:02
have a slot capability. I already know that (see what I said
previously about physical links). The fact the downstream ports have a
slot capability also has absolutely nothing to do with anything I was
saying. Look at the lspci output for 0004:01:00.0 which is the
switch's upstream port. The upstream port device will not have a slot
capability because it's a bridge into the virtual PCI bus that is
internal to the switch.
It seems like your explanation about the missing 0004:01:00.0 may be
correct and could be due to a firmware bug. However, the scope of this
patch does not relate to this issue. Additionally, if it starts with
0004:01:00.0 to 0004:01:03.0, the behavior of hot-unplug and hot-plug
operations will remain inconsistent. This patch aims to address the
inconsistent behavior of hot-unplug and hot-plug.

*snip*
quoted
It might be worth adding some logic to pnv_php to verify the PCI
bridge upstream of the slot actually has the PCIe slot capability to
guard against this problem.
We can have a look at this problem in another patch.
The point of this series is to fix the behaviour of pnv_php, is it
not? Powering off a PCI(e) slot is supposed to render it safe to
remove the card  in that slot. Currently if you "power off" C5, the
kernel is still going to have active references to the switch's
upstream port device (at 0004:01:00.0) and the switch management
function (at 0004:01:00.1). If the kernel has active references to PCI
devices physically located in the slot we supposedly powered off, then
the hotplug driver isn't doing its job. The asymmetry between hot add
and removal that you're trying to fix here is a side effect of the
fact that pnv_php is advertising the wrong thing as a slot. I think
you should stop pnv_php from advertising something as a slot when it's
not actually a slot because that's the root of all your problems.
We wanted to handle the more generic case and did not want to be confined to
only one device assumption. We want to fix the current inconsistent behavior
more generically.
Right, as I said above I don't think handing the more generic case is
actually required if pnv_php is doing its job properly. It doesn't
hurt though.
Regarding the fix, the fix is obvious:
really?
We have to traverse
and find the bridge ports from DT and invoke  pci_scan_slot() on them. This will
discover and create the entry for bridge ports (0004:02:00.0 to 0004:02:00.3 on
the given bus- 0004:02). There is already an existing function, pci_scan_bridge()
which is doing invocation of pci_scan_slot () for the devices behind the bridge,
in this case for  SAS device. So eventually, we are doing a scan of all the entities
behind the slot.
I already read your patch so I'm not sure why you feel the need to
re-describe it in tedious detail.
Would you like me to combine the non-bridge and bridge cases into one? I can attempt
to do this. Hopefully, if we incorporate the iterate sibling logic case correctly,
we may not need to maintain these two separate cases for bridge and non-bridge. I
will attempt this, and if it works, I will include it in the next patch. Thanks.
Yes, do that.

Also, do not post HTML emails to linux development lists. It breaks
plain text inline quoting which makes your messages annoying to reply
to. Some linux development lists will also silently drop HTML emails.
Please talk to the other LTC engineers about how to set up your mail
client to send plain text emails to avoid these problems in the
future.

Oliver
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help