Re: [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange
From: Jerin Jacob Kollanukkaran <hidden>
Date: 2019-07-19 06:25:08
-----Original Message----- From: Phil Yang <redacted> Sent: Friday, June 28, 2019 1:42 PM To: dev@dpdk.org Cc: thomas@monjalon.net; Jerin Jacob Kollanukkaran <redacted>; hemant.agrawal@nxp.com; Honnappa.Nagarahalli@arm.com; gavin.hu@arm.com; nd@arm.com; gage.eads@intel.com Subject: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange External Email ---------------------------------------------------------------------- Add 128-bit atomic compare exchange on aarch64. Signed-off-by: Phil Yang <redacted> Tested-by: Honnappa Nagarahalli <redacted> Reviewed-by: Honnappa Nagarahalli <redacted> --- +#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || \ + (mo) == __ATOMIC_ACQ_REL || \ + (mo) == __ATOMIC_SEQ_CST) + +#define RTE_MO_LOAD(mo) (RTE_HAS_ACQ((mo)) \ + ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define RTE_MO_STORE(mo) +(RTE_HAS_RLS((mo)) \ + ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) +
The one starts with RTE_ are public symbols, If it is generic enough, Move to common layer so that every architecturse can use. If you think, otherwise make it internal
+#ifdef __ARM_FEATURE_ATOMICS
This define is added in gcc 9.1 and I believe for clang it is not supported yet. So old gcc and clang this will be undefined. I think, With meson + native build, we can find the presence of ATOMIC support by running a.out. Not sure about make and cross build case. I don't want block this feature because of this, IMO, We can add this code with existing __ARM_FEATURE_ATOMICS scheme and later find a method to enhance it. But please check how to fix it.
+#define __ATOMIC128_CAS_OP(cas_op_name, op_string) \
+static inline rte_int128_t \
+cas_op_name(rte_int128_t *dst, rte_int128_t old, \
+ rte_int128_t updated) \
+{ \
+ /* caspX instructions register pair must start from even-numbered
+ * register at operand 1.
+ * So, specify registers for local variables here.
+ */ \
+ register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; \
Since direct x0 register used in the code and
cas_op_name() and rte_atomic128_cmp_exchange() is inline function,
Based on parent function load, we may corrupt x0 register aka
Break arm64 ABI. Not sure clobber list will help here or not?
Making it as no_inline will help but not sure about the performance impact.
May be you can check with compiler team.
We burned our hands with this scheme, see
5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix possible arm64 ABI break")
Probably we can choose a scheme for rc2 and adjust as when we have complete clarity.
+ register uint64_t x1 __asm("x1") = (uint64_t)old.val[1]; \
+ register uint64_t x2 __asm("x2") = (uint64_t)updated.val[0]; \
+ register uint64_t x3 __asm("x3") = (uint64_t)updated.val[1]; \
+ asm volatile( \
+ op_string " %[old0], %[old1], %[upd0], %[upd1],
[%[dst]]" \
+ : [old0] "+r" (x0), \
+ [old1] "+r" (x1) \
+ : [upd0] "r" (x2), \
+ [upd1] "r" (x3), \
+ [dst] "r" (dst) \
+ : "memory"); \Should n't we add x0,x1, x2, x3 in clobber list?
quoted hunk ↗ jump to hunk
static inline int __rte_experimental rte_atomic128_cmp_exchange(rte_int128_t *dst, rte_int128_t *exp,diff --git a/lib/librte_eal/common/include/generic/rte_atomic.hb/lib/librte_eal/common/include/generic/rte_atomic.h index 9958543..2355e50 100644--- a/lib/librte_eal/common/include/generic/rte_atomic.h +++ b/lib/librte_eal/common/include/generic/rte_atomic.h@@ -1081,6 +1081,20 @@ static inline voidrte_atomic64_clear(rte_atomic64_t *v) /*------------------------ 128 bit atomic operations -------------------------*/ +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
There is nothing specific to x86 and arm64 here, Can we remove this #ifdef ?
+/**
+ * 128-bit integer structure.
+ */
+RTE_STD_C11
+typedef struct {
+ RTE_STD_C11
+ union {
+ uint64_t val[2];
+ __extension__ __int128 int128;
+ };
+} __rte_aligned(16) rte_int128_t;
+#endif
+
#ifdef __DOXYGEN__