Thread (76 messages) 76 messages, 11 authors, 2025-04-22

Re: [RFC PATCH 00/13] Ultra Ethernet driver introduction

From: Yunsheng Lin <hidden>
Date: 2025-03-20 11:13:05
Also in: linux-rdma

On 2025/3/20 0:48, Jason Gunthorpe wrote:
On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote:
quoted
Hi all,
This patch-set introduces minimal Ultra Ethernet driver infrastructure and
the lowest Ultra Ethernet sublayer - the Packet Delivery Sublayer (PDS),
which underpins the entire communication model of the Ultra Ethernet
Transport[1] (UET). Ultra Ethernet is a new RDMA transport designed for
efficient AI and HPC communication.
I was away while this discussion happened so I've gone through and
read the threads, looked at the patches and I don't think I've changed
my view since I talked to Enfabrica privately on this topic almost a
year ago.

I do not agree with creating a new subsystem (or whatever you are
calling drivers/ultraeth) for a single RDMA protocol and see nothing
new here to change my mind. I would likely NAK the direction I see in
this RFC, as I have other past attempts to build RDMA HW interfaces
outside of the RDMA subystem.

Since none of that past discussion seems to have been acknowledged or
rebutted in this series I will repeat the main points:

1) I'm aware of something like 5-7 new protocols that are competing
   for the same market as Ultra Ethernet. We can't give everyone and
   their dog a new subsystem (or whatever) and all the maintainability
   negatives that come with that. As a matter of maintainability we
   need to see consolidation here, not fragmentation!

   Yes, UE is a consortium driven standard, which is unique and a big
   positive, but I don't believe anyone can say for certain what
   direction the industry is going to go in. Many consortium standards
   have failed to get adoption in the past even with a large number of
   member companies.

   Nor can we know what concepts in UE are going to be copied into
   other competing RDMA transports. See my other remarks on job key
   for an example. Prematurely siloing stuff in drivers/ultraeth is
   very much the wrong technical direction for maintainability.

   That said, I think UE should be in the kernel and have a fair
   chance to compete for market share. Just in a maintainable and
   appropriate way while the industry evolves.

2) Due to the above, I'm pretty confident we will see RDMA NICs
   supporting a lot of different protocols. In fact they already do.

   From a kernel maintainability perspective we really want one RDMA
   driver leveraging as much common infrastructure between the
   protocols as possible. We do not want to see a single HW driver
   further split up needlessly to other subsystems, that would be a
   big maintainability downside.

   To put a clear point on this, mlx5 has been gaining new protocols
   and fitting into the existing driver model for a number of years
   now. In fact there is speculation that UE could be implemented in
   mlx5 RDMA with minimal kernel changes. There would be no reason to
   try to mess up the driver to also interact with this stuff in
   drivers/ultraeth as seems to be proposed here.

   I think other HW will be similar. UE isn't so radically different
   that every HW path will need to diverge from classical RDMA. Nor is
   is so dissimilar to other competing proposals. We don't want
   artificial differences we want to create things that can be re-used
   when appropriate.

   Leon's response to Bart is correct, we already have similar
   examples of almost everything UE does. Bart is also correct that
   verbs would be a PITA, but RDMA userspace has moved beyond verbs
   limitations years ago now. Alot of mlx5 stuff is not using verbs
   today, for instance. EFA and other examples use extensive stuff
   beyond verbs.
Regarding to reuse the existing rdma subsystem for a new protocol:
Currently EFA seems to be layering a RDM layer on top of the SRD
transport layer, see [1], and RDM layer is implemented by software in
the libfabric while SRD seems to be implemented by hardware, which
provides 'Scalable Reliable Datagram' service through the QP type
of EFA_QP_DRIVER_TYPE_SRD.

I am not sure if layers like SRD and RDM are clean layering from
protocol design perspective.
But if the hardware implement both SRD and RDM layer in hardware,
then there might be two types of object need managing, SRD object
might be shared between different applications, and RDM object
need to be created based on a SRD object.

As the existing rdma subsystem doesn't seems to support the above
use case yet and as we are discussing a possible new subsystem or
updating existing subsystem to support new protocol here, it would
be good to discuss if it is possible to support the above case or
another new subsystem is needed for that use case too.

1. https://github.com/ofiwg/libfabric/blob/main/prov/efa/docs/efa_rdm_protocol_v4.md
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help