Re: [RFC PATCH 2/2] RDMA/rxe: Add RDMA Atomic Write operation
From: Tom Talpey <tom@talpey.com>
Date: 2021-12-31 15:09:54
On 12/31/2021 3:29 AM, yangx.jy@fujitsu.com wrote:
On 2021/12/31 5:39, Tom Talpey wrote:quoted
On 12/30/2021 7:14 AM, Xiao Yang wrote:quoted
This patch implements RDMA Atomic Write operation for RC service. Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com> --- drivers/infiniband/sw/rxe/rxe_comp.c | 4 +++ drivers/infiniband/sw/rxe/rxe_opcode.c | 18 ++++++++++ drivers/infiniband/sw/rxe/rxe_opcode.h | 3 ++ drivers/infiniband/sw/rxe/rxe_qp.c | 3 +- drivers/infiniband/sw/rxe/rxe_req.c | 10 ++++-- drivers/infiniband/sw/rxe/rxe_resp.c | 49 +++++++++++++++++++++----- include/rdma/ib_pack.h | 2 ++ include/rdma/ib_verbs.h | 2 ++ include/uapi/rdma/ib_user_verbs.h | 2 ++ 9 files changed, 81 insertions(+), 12 deletions(-)<snip>quoted
+/* Guarantee atomicity of atomic write operations at the machine level. */ +static DEFINE_SPINLOCK(atomic_write_ops_lock); + +static enum resp_states process_atomic_write(struct rxe_qp *qp, + struct rxe_pkt_info *pkt) +{ + u64 *src, *dst; + struct rxe_mr *mr = qp->resp.mr; + + src = payload_addr(pkt); + + dst = iova_to_vaddr(mr, qp->resp.va + qp->resp.offset, sizeof(u64)); + if (!dst || (uintptr_t)dst & 7) + return RESPST_ERR_MISALIGNED_ATOMIC; + + spin_lock_bh(&atomic_write_ops_lock); + *dst = *src; + spin_unlock_bh(&atomic_write_ops_lock);Spinlock protection is insufficient! Using a spinlock protects only the atomicity of the store operation with respect to another remote atomic_write operation. But the semantics of RDMA_ATOMIC_WRITE go much further. The operation requires a fully atomic bus transaction, across the memory complex and with respect to CPU, PCI, and other sources. And this guarantee needs to apply to all architectures, including ones with noncoherent caches (software consistency). Because RXE is a software provider, I believe the most natural approach here is to use an atomic64_set(dst, *src). But I'm not certain this is supported on all the target architectures, therefore it may require some additional failure checks, such as stricter alignment than testing (dst & 7), or returning a failure if atomic64 operations are not available. I especially think the ARM and PPC platforms will need an expert review.Hi Tom, Thanks a lot for the detailed suggestion. I will try to use atomic64_set() and add additional failure checks. By the way, process_atomic() uses the same method(spinlock + assignment expression), so do you think we also need to update it by using atomic64_set()?
It is *criticial* for you to understand that the ATOMIC_WRITE has a much stronger semantic than ordinary RDMA atomics. The ordinary atomics are only atomic from the perspective of a single connection and a single adapter. And the result is not placed in memory atomically, only its execution in the adapter is processed that way. And the final result is only consistently visible to the application after a completion event is delivered. This rather huge compromise is because when atomics were first designed, there were no PCI primitives which supported atomics. An RDMA atomic_write operation is atomic all the way to memory. It must to be implemented in a platform operation which is similarly atomic. If implemented by a PCIe adapter, it would use the newer PCIe atomics to perform a locked read-modify-write. If implemented in software as you are doing, it must use a similar local r-m-w instruction, or the platform equivalent if necessary. Tom.