Re: RDMA will be reverted
From: Steve Wise <hidden>
Date: 2006-07-05 17:50:36
On Wed, 2006-07-05 at 12:09 -0500, Tom Tucker wrote:
On Sat, 2006-07-01 at 16:26 +0200, Andi Kleen wrote:quoted
On Saturday 01 July 2006 01:01, Tom Tucker wrote:quoted
On Fri, 2006-06-30 at 14:16 -0700, David Miller wrote:quoted
The TOE folks have tried to submit their hooks and drivers on several occaisions, and we've rejected it every time.iWARP != TOEPerhaps a good start of that discussion David asked for would be if you could give us an overview of the differences and how you avoid the TOE problems.I think Roland already gave the high-level overview. For those interested in some of the details, the API for iWARP transports was originally conceived independently from IB and is documented in the RDMAC Verbs Specification found here: http://www.rdmaconsortium.org/home/draft-hilland-iwarp-verbs-v1.0-RDMAC.pdf The protocols, etc... are available here: http://www.ietf.org/html.charters/rddp-charter.html As Roland mentioned, the RDMAC verbs are *very* similar to the IB verbs and so when we were thinking about how to design an API for iWARP we concluded it would be best to leverage the tremendous amount of work already done for IB by OpenFabrics and then work iteratively to extend this API to include features unique to iWARP. This work has been ongoing since September of 2005. There is an open source svn repository available for the iWARP source at https://openib.org/svn/gen2/branches/iwarp. There is also an open source NFS over RDMA implementation for Linux available here that: http://sourceforge.net/projects/nfs-rdma. So how do we avoid the TOE pitfalls with iWARP? I think it depends on the pitfall. At the low level: - Stale Network/Address Information: Path MTU Change, ICMP Redirect and ARP next hop changes need netlink notifier events so that hardware can be updated when they change. I see this support as an extension (new events) to an existing service and a relatively low-level of "parallel stack integration". iSCSI and IB could also benefit from these events. - Port Space Collision, i.e. socket app and rdma/iWARP apps collide on a port number: The RDMA CMA needs to be able to allocate and de-allocate port numbers, however, the services that do this today are not exported and would need some minor tweaking. iSCSI and IB benefit from these services as well. - netfilter rules, syn-flood, conn-rate, etc.... You pointed out that if connection establishment were done in the native stack (SYN, SYN/ACK), that this would account for the bulk of the netfilter utility, however, this probably results in falling into many of the TOE traps people have issue with.
However, iWARP devices _could_ integrate with netfilter. For most devices the connection request event (SYN) gets passed up to the host driver. So the driver can enforce filter rules then. Also, i think a notification type mechanism could be used to trigger iWARP drivers to go re-apply filter rules on existing connections and kill ones that should be filtered. I'm not that familiar yet with netfilter, but I think this could all be done... Steve.