Thread (14 messages) 14 messages, 4 authors, 2026-04-02

Re: [PATCH v7 1/3] PCI: AtomicOps: Do not enable requests by RCiEPs

From: "Kuehling, Felix" <felix.kuehling@amd.com>
Date: 2026-03-31 18:39:39
Also in: linux-pci, linux-rdma, linux-s390, lkml

On 2026-03-31 14:09, Bjorn Helgaas wrote:
On Mon, Mar 30, 2026 at 08:01:57PM -0400, Kuehling, Felix wrote:
quoted
On 2026-03-30 17:42, Bjorn Helgaas wrote:
quoted
[+to amdgpu, bnxe_re, mlx5 IB, qedr, mlx5 maintainers]

On Mon, Mar 30, 2026 at 03:09:44PM +0200, Gerd Bayer wrote:
quoted
Since root complex integrated end points (RCiEPs) attach to a bus that
has no bridge device describing the root port, the capability to
complete AtomicOps requests cannot be determined with PCIe methods.

Change default of pci_enable_atomic_ops_to_root() to not enable
AtomicOps requests on RCiEPs.
I know I suggested this because there's nothing explicit that tells us
whether the RC supports atomic ops from RCiEPs [1].  But I'm concerned
that GPUs, infiniband HCAs, and NICs that use atomic ops may be
implemented as RCiEPs and would be broken by this.
FWIW, on AMD APUs our driver doesn't call pci_enable_atomic_ops_to_root. It
just assumes that the GPU can do atomic accesses because it doesn't actually
go through PCIe: https://elixir.bootlin.com/linux/v6.19.10/source/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c#L4785
What does this mean for the other branch that *does* use
pci_enable_atomic_ops_to_root()?  Can any of those devices be RCiEPs?
Most AMD GPUs are not integrated endpoints. APUs are integrated. There 
are A+A GPUs where the GPUs are separate from the CPU but part of the 
same coherent data fabric as the CPU (adev->gmc.xbmi.connected_to_cpu == 
true). Those may also be considered RCiEPs. (I'm not sure about that, is 
there an easy way to check with lspci?) We may need to include that in 
the same branch as APUs.

You can see that we did that for a new generation of A+A GPU here: 
https://gitlab.freedesktop.org/agd5f/linux/-/blob/amd-staging-drm-next/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c?ref_type=heads#L3920. 
We'd need to confirm that the same works for MI200 A+A GPUs as well.

Regards,
   Felix

quoted
quoted
These drivers use pci_enable_atomic_ops_to_root():

    amdgpu
    bnxt_re (infiniband)
    mlx5 (infinband)
    qedr (infiniband)
    mlx5 (ethernet)

Maybe we should assume that because RCiEPs are directly integrated
into the RC, the RCiEP would only allow AtomicOp Requester Enable to
be set if the RC supports atomic ops?

I don't like making assumptions like that, but it'd be worse to break
these devices.

[1] https://lore.kernel.org/all/20260326164002.GA1325368@bhelgaas (local)
quoted
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
   drivers/pci/pci.c | 5 ++---
   1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 8479c2e1f74f1044416281aba11bf071ea89488a..135e5b591df405e87e7f520a618d7e2ccba55ce1 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3692,15 +3692,14 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
   	/*
   	 * Per PCIe r4.0, sec 6.15, endpoints and root ports may be
-	 * AtomicOp requesters.  For now, we only support endpoints as
-	 * requesters and root ports as completers.  No endpoints as
+	 * AtomicOp requesters.  For now, we only support (legacy) endpoints
+	 * as requesters and root ports as completers.  No endpoints as
   	 * completers, and no peer-to-peer.
   	 */
   	switch (pci_pcie_type(dev)) {
   	case PCI_EXP_TYPE_ENDPOINT:
   	case PCI_EXP_TYPE_LEG_END:
-	case PCI_EXP_TYPE_RC_END:
   		break;
   	default:
   		return -EINVAL;
-- 
2.51.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help