Re: [PATCH 1/1] eal: add 128-bit cmpset (x86-64 only)
From: Ola Liljedahl <hidden>
Date: 2019-02-01 19:01:39
On Fri, 2019-02-01 at 17:06 +0000, Eads, Gage wrote:
quoted
-----Original Message----- From: Ola Liljedahl [mailto:Ola.Liljedahl@arm.com] Sent: Monday, January 28, 2019 5:02 PM To: Eads, Gage <redacted>; dev@dpdk.org Cc: arybchenko@solarflare.com; jerinj@marvell.com; chaozhu@linux.vnet.ibm.com; nd [off-list ref]; Richardson, Bruce [off-list ref]; Ananyev, Konstantin [off-list ref]; hemant.agrawal@nxp.com; olivier.matz@6wind.com; Honnappa Nagarahalli [off-list ref]; Gavin Hu (Arm Technology China) [off-list ref] Subject: Re: [dpdk-dev] [PATCH 1/1] eal: add 128-bit cmpset (x86-64 only) On Mon, 2019-01-28 at 11:29 -0600, Gage Eads wrote:quoted
This operation can be used for non-blocking algorithms, such as a non-blocking stack or ring. Signed-off-by: Gage Eads <redacted> --- .../common/include/arch/x86/rte_atomic_64.h | 31 +++++++++++ lib/librte_eal/common/include/generic/rte_atomic.h | 65 ++++++++++++++++++++++ 2 files changed, 96 insertions(+)diff --git a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.hb/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h index fd2ec9c53..b7b90b83e 100644--- a/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h +++ b/lib/librte_eal/common/include/arch/x86/rte_atomic_64.h@@ -34,6 +34,7 @@/* * Inspired from FreeBSD src/sys/amd64/include/atomic.h * Copyright (c) 1998 Doug Rabson + * Copyright (c) 2019 Intel Corporation * All rights reserved. */@@ -46,6 +47,7 @@#include <stdint.h> #include <rte_common.h> +#include <rte_compat.h> #include <rte_atomic.h> /*------------------------- 64 bit atomic operations ------------------------ -*/ @@ -208,4 +210,33 @@ static inline void rte_atomic64_clear(rte_atomic64_t *v) } #endif +static inline int __rte_experimental__rte_always_inline?quoted
+rte_atomic128_cmpset(volatile rte_int128_t *dst,No need to declare the location volatile. Volatile doesn't do what you think it does. https://youtu.be/lkgszkPnV8g?t=1027I made this volatile to match the existing rte_atomicN_cmpset definitions, which presumably have a good reason for using the keyword. Maintainers, any input here?quoted
quoted
+ rte_int128_t *exp,I would declare 'exp' const as well and document that 'exp' is not updated (with the old value) for a failure. The reason being that ARMv8.0/AArch64 cannot atomically read the old value without also writing the location and that is bad for performance (unnecessary writes leads to unnecessary contention and worse scalability). And the user must anyway read the location (in the start of the critical section) using e.g. non-atomic 64-bit reads so there isn't actually any requirement for an atomic 128-bit read of the location.Will change in v2.quoted
quoted
rte_int128_t *src,const rte_int128_t *src?Sure, I don't see any harm in using const.quoted
But why are we not passing 'exp' and 'src' by value? That works great, even with structs. Passing by value simplifies the compiler's life, especially if the call is inlined. Ask a compiler developer.I ran objdump on the nb_stack code with both approaches, and pass-by-reference resulted in fewer overall x86_64 assembly ops. PBV: 100 ops for push, 97 ops for pop PBR: 92 ops for push, 84 ops for pop
OK I have never checked x86_64 code generation... I have good experiences with ARM/AArch64, everything seems to be done using registers. I am surprised there is a difference. Did a quick check with lfring, passing 'src' (third param) by reference and by value. No difference in code generation on x86_64. But if you insist let's go with PBR.
(Using the in-progress v5 nb_stack code) Another factor -- though much less compelling -- is that with pass-by- reference, the user can create a 16B structure and cast it to rte_int128_t when they call rte_atomic128_cmpset, whereas with pass-by-value they need to put that struct in a union with rte_int128_t.
Which is what I always do nowadays... Trying to use as few casts as possible and lie to the compiler as seldom as possible. But I can see the freedom provided by taking a pointer to something and cast it it rte_int128_t ptr in the call to rte_atomic128_cmpset(). Would prefer a name that is more similar to __atomic_compare_exchange(). E.g. rte_atomic128_compare_exchange() (or perhaps just rte_atomic128_cmpxchg)? All the rte_atomicXX_cmpset() functions do not take any memory order parameters. From an Arm perspective, we are not happy with that.
quoted
quoted
+ unsigned int weak, + enum rte_atomic_memmodel_t success, + enum rte_atomic_memmodel_t failure) { + RTE_SET_USED(weak); + RTE_SET_USED(success); + RTE_SET_USED(failure); + uint8_t res; + + asm volatile ( + MPLOCKED + "cmpxchg16b %[dst];" + " sete %[res]" + : [dst] "=m" (dst->val[0]), + "=A" (exp->val[0]), + [res] "=r" (res) + : "c" (src->val[1]), + "b" (src->val[0]), + "m" (dst->val[0]), + "d" (exp->val[1]), + "a" (exp->val[0]) + : "memory"); + + return res; +} + #endif /* _RTE_ATOMIC_X86_64_H_ */diff --git a/lib/librte_eal/common/include/generic/rte_atomic.hb/lib/librte_eal/common/include/generic/rte_atomic.h index b99ba4688..8d612d566 100644--- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h@@ -14,6 +14,7 @@#include <stdint.h> #include <rte_common.h> +#include <rte_compat.h> #ifdef __DOXYGEN__@@ -1082,4 +1083,68 @@ static inline voidrte_atomic64_clear(rte_atomic64_t *v) } #endif +/*------------------------ 128 bit atomic operations +------------------------ -*/ + +/** + * 128-bit integer structure. + */ +typedef struct { + uint64_t val[2]; +} __rte_aligned(16) rte_int128_t;So we can't use __int128?I'll put it in a union with val[2], in case any implementations want to use it.
Thinking on this one more time, since the inline asm functions (e.g. for x86_64
cmpxchg16b and for AArch64 LDXP/STXP) anyway will use 64-bit registers, it makes
most sense to make rte_int128_t a struct of 2x64b. The question is whether to
use an array like above or a struct with two elements (which I normally do
internally). Can you compare code generation with the following definition?
typedef struct {
uint64_t lo, hi;
} __rte_aligned(16) rte_int128_t;
Thanks, Gage [snip]
-- Ola Liljedahl, Networking System Architect, Arm Phone +46706866373, Skype ola.liljedahl