Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2025-03-26 15:53:34
Also in:
linux-rdma
On Wed, Mar 26, 2025 at 03:29:01PM +0000, Sean Hefty wrote:
quoted
quoted
quoted
If I understand UEC's job semantics correctly, then the local scope of a job may span multiple local ports from multiple local devices. It would of course translate into device specific reservations.Agreed. I should have said job id/address has a network address scope. For example, job 3 at 10.0.0.1 _may_ be a different logical job than job 3 at 10.0.0.2. Or they could also belong to the same logical job. Or the same logical job may use different job id values for different network addresses. A device-centric model is more aligned with the RDMA stack. IMO, higher-level SW would then be responsible for configuring and managing the logical job. For example, maybe it needs to assign and configure non-RDMA resources as well. For that reason, I would push the logical job management outside the kernel subsystem.Like I said already, I think Job needs to be a first class RDMA object that is used by all transports that have job semantics.How do you handle or expose device specific resource allocations or restrictions, which may be needed? Should a kernel 'RDMA job manager' abstract device level resources? Consider a situation where a MR or MW should only be accessible by a specific job. When the MR is created, the device specific job resource may be needed. Should drivers need to query the job manager to map some global object to a device specific resource?
I imagine for cases like that the job would be linked to the PD and then MR -> PD -> Job. The kernel side would create any HW object for the job when the PD is created for a specific HW device. The PD security semantic for the MR would be a little bit different in that the PD is more like a shared PD. Jason