Re: [PATCH 1/1] eal: add 128-bit cmpset (x86-64 only)
From: Honnappa Nagarahalli <hidden>
Date: 2019-02-04 18:33:26
quoted
On Mon, 2019-01-28 at 11:29 -0600, Gage Eads wrote:quoted
This operation can be used for non-blocking algorithms, such as a non-blocking stack or ring. Signed-off-by: Gage Eads <redacted> --- .../common/include/arch/x86/rte_atomic_64.h | 31 +++++++++++ lib/librte_eal/common/include/generic/rte_atomic.h | 65 ++++++++++++++++++++++ 2 files changed, 96 insertions(+)diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.hb/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h index fd2ec9c53..b7b90b83e 100644--- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h@@ -34,6 +34,7 @@/* * Inspired from FreeBSD src/sys/amd64/include/atomic.h * Copyright (c) 1998 Doug Rabson + * Copyright (c) 2019 Intel Corporation * All rights reserved. */@@ -46,6 +47,7 @@#include <stdint.h> #include <rte_common.h> +#include <rte_compat.h> #include <rte_atomic.h> /*------------------------- 64 bit atomic operations ------------------------ -*/ @@ -208,4 +210,33 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) } #endif +static inline int __rte_experimental__rte_always_inline?quoted
+rte_atomic128_cmpset(volatile rte_int128_t *dst,No need to declare the location volatile. Volatile doesn't do what you think it does. https://youtu.be/lkgszkPnV8g?t=1027I made this volatile to match the existing rte_atomicN_cmpset definitions, which presumably have a good reason for using the keyword. Maintainers, any input here?quoted
quoted
+ rte_int128_t *exp,I would declare 'exp' const as well and document that 'exp' is not updated (with the old value) for a failure. The reason being that ARMv8.0/AArch64 cannot atomically read the old value without also writing the location and that is bad for performance (unnecessary writes leads to unnecessary contention and worse scalability). And the user must anyway read the location (in the start of the critical section) using e.g. non-atomic 64-bit reads so there isn't actually anyrequirement for an atomic 128-bit read of the location.quoted
Will change in v2.
IMO, we should not change the definition of this API, because 1) This API will differ from __atomic_compare_exchange_n API. It will be a new API to learn for the users. 2) The definition in this patch will make it easy to replace this API call with __atomic_xxx API (whenever it supports 128b natively on all the platforms) 3) I do not see any documentation in [1] indicating that the 'read on failure' will be an atomic read. [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
quoted
quoted
rte_int128_t *src,const rte_int128_t *src?Sure, I don't see any harm in using const.quoted
But why are we not passing 'exp' and 'src' by value? That works great, even with structs. Passing by value simplifies the compiler's life, especially if the call is inlined. Ask a compiler developer.I ran objdump on the nb_stack code with both approaches, and pass-by- reference resulted in fewer overall x86_64 assembly ops. PBV: 100 ops for push, 97 ops for pop PBR: 92 ops for push, 84 ops for pop (Using the in-progress v5 nb_stack code) Another factor -- though much less compelling -- is that with pass-by-reference, the user can create a 16B structure and cast it to rte_int128_t when they call rte_atomic128_cmpset, whereas with pass-by-value they need to put that struct in a union with rte_int128_t.quoted
quoted
+ unsigned int weak, + enum rte_atomic_memmodel_t success, + enum rte_atomic_memmodel_t failure) { + RTE_SET_USED(weak); + RTE_SET_USED(success); + RTE_SET_USED(failure); + uint8_t res; + + asm volatile ( + MPLOCKED + "cmpxchg16b %[dst];" + " sete %[res]" + : [dst] "=m" (dst->val[0]), + "=A" (exp->val[0]), + [res] "=r" (res) + : "c" (src->val[1]), + "b" (src->val[0]), + "m" (dst->val[0]), + "d" (exp->val[1]), + "a" (exp->val[0]) + : "memory"); + + return res; +} + #endif /* _RTE_ATOMIC_X86_64_H_ */diff --git a/lib/librte_eal/common/include/generic/rte_atomic.hb/lib/librte_eal/common/include/generic/rte_atomic.h index b99ba4688..8d612d566 100644--- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h@@ -14,6 +14,7 @@#include <stdint.h> #include <rte_common.h> +#include <rte_compat.h> #ifdef __DOXYGEN__@@ -1082,4 +1083,68 @@ static inline voidrte_atomic64_clear(rte_atomic64_t *v) } #endif +/*------------------------ 128 bit atomic operations +------------------------ -*/ + +/** + * 128-bit integer structure. + */ +typedef struct { + uint64_t val[2]; +} __rte_aligned(16) rte_int128_t;So we can't use __int128?I'll put it in a union with val[2], in case any implementations want to use it. Thanks, Gage [snip]