Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2025-03-27 13:26:51
Also in:
linux-rdma
On Wed, Mar 26, 2025 at 05:39:52PM +0000, Sean Hefty wrote:
The PD is a problem, as it's not a transport function. It's a hardware implementation component; one which may NOT exist for a UEC NIC. (I know there are NICs which do not implement PDs and have secure RDMA transfers.)
The PD is just a concept representing security, there are lots of ways to implement this, so long as it achieves an isolation you would label it a PD and the PD flows through all the objects that participate in the isolation. The basic essential requirement is that a registered userspace memory cannot be accessed by things outside the definition of pd/shared pd. This is really important, I'm quite concerned that any RDMA protocol come with some solid definition of PD mapped to the underlying technology that matches Linux's inter-process security needs. For instance Habana defined a PD as a singleton object and the first process to get it had exclusive use of the HW. This is because their HW could not do any inter-process security.
I have a proposal to rework/redefine PDs to support a more general model,
It would certainly be good to have some text explaining some of the mappings to different technologies.
which I think will work for NICs that need a PD and ones that don't. It can support MR -> PD -> Job, but I considered the PD -> job relationship as 1 to many.
Yes, and the 1:1 is degenerate.
Sure, It's challenging in that a UET endpoint (QP) may communicate with multiple jobs, and a MR may be accessible by a single job, all jobs, or only a few.
I would suggest that the PD is a superset of all jobs and the objects (endpoint, mr, etc) get to choose a subset of the PD's jobs during allocation? Or you keep job/pd as 1:1 and allow specifying multiple PDs during object allocation. But to be clear, this is largely verbs modeling stuff - however there is a certain practicality to trying to fit this multi-job ability into a PD because it allow reusing alot of existing uAPI kernel code. Especially if people are going to take existing RDMA HW and tweak it to some level of UET (ie support only single job) and still require a HW level PD under the covers. Jason