Re: [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange
From: Phil Yang (Arm Technology China) <hidden>
Date: 2019-07-19 13:56:33
-----Original Message----- From: Jerin Jacob Kollanukkaran <redacted> Sent: Friday, July 19, 2019 8:35 PM To: Phil Yang (Arm Technology China) <redacted>; dev@dpdk.org Cc: thomas@monjalon.net; hemant.agrawal@nxp.com; Honnappa Nagarahalli [off-list ref]; Gavin Hu (Arm Technology China) [off-list ref]; nd [off-list ref]; gage.eads@intel.com; nd [off-list ref] Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchangequoted
quoted
quoted
+#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) != +__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) == __ATOMIC_RELEASE || \ + (mo) == __ATOMIC_ACQ_REL || \ + (mo) == __ATOMIC_SEQ_CST) + +#define RTE_MO_LOAD(mo) (RTE_HAS_ACQ((mo)) \ + ? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define RTE_MO_STORE(mo) +(RTE_HAS_RLS((mo)) \ + ? __ATOMIC_RELEASE : __ATOMIC_RELAXED) +The one starts with RTE_ are public symbols, If it is generic enough, Move to common layer so that every architecturse can use. If you think, otherwise make it internalLet's keep it internal. I will remove the 'RTE_' tag.Probably change to __HAS_ACQ to avoid collision(just in case)
OK.
quoted
quoted
quoted
+#ifdef __ARM_FEATURE_ATOMICSThis define is added in gcc 9.1 and I believe for clang it is not supportedyet.quoted
quoted
So old gcc and clang this will be undefined. I think, With meson + native build, we can find the presence of ATOMIC support by running a.out. Not sure about make and cross buildcase.quoted
quoted
I don't want block this feature because of this, IMO, We can add this code with existing __ARM_FEATURE_ATOMICS scheme and later find a method to enhance it. But please check how to fix it.OK.After thinking on this a bit, I think, in order to support old gcc(< gcc 9.1) and clang, We can introduce a config option, where, by default it is disabled and enable In specific config(where we know, lse is supported) and meson config. i.e #if defined(__ARM_FEATURE_ATOMICS) || defined(RTE_ARM_FEATURE_ATOMICS)
Cool
quoted
quoted
quoted
+#define __ATOMIC128_CAS_OP(cas_op_name, op_string)\quoted
quoted
quoted
+static inline rte_int128_t \ +cas_op_name(rte_int128_t *dst, rte_int128_t old, \ + rte_int128_t updated) \ +{ \ + /* caspX instructions register pair must start from even-numbered + * register at operand 1. + * So, specify registers for local variables here. + */ \ + register uint64_t x0 __asm("x0") = (uint64_t)old.val[0]; \Since direct x0 register used in the code and cas_op_name() and rte_atomic128_cmp_exchange() is inline function, Based on parent function load, we may corrupt x0 register akaSince x0/x1 and x2/x3 are used a lot and often contain live values. Maybe to change them to some relatively less frequently used registerslikequoted
x14/x15 and x16/x17 might help for this case? According to the PCS (Procedure Call Standard), x14-x17 are also temporary registers.X14-x17 are temporary registers but since cas_op_name() and rte_atomic128_cmp_exchange() are inline functions, Based on the parent function register usage, it _may_ corrupt.
Just checked how Linux Kernel does similar things: https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/atomic_lse.h#L19 Same methods. I will finish the benchmarking for the no_inline approach. If it has no significant performance loss, I think we can make it as no_inline.
quoted
quoted
Break arm64 ABI. Not sure clobber list will help here or not?In my understanding, for the register variable, if it contains a live value inthequoted
specified register, the compiler will move the live value into a free register. Since x0~x3 are present in the input/output operands and x0/x1's valueneeds toquoted
be restored to the variable 'old' as a return value. So I didn't add them into the clobber list.OKquoted
quoted
Making it as no_inline will help but not sure about the performanceimpact.quoted
quoted
May be you can check with compiler team. We burned our hands with this scheme, see 5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix possible arm64 ABI break") Probably we can choose a scheme for rc2 and adjust as when we have complete clarity.quoted
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)There is nothing specific to x86 and arm64 here, Can we remove this#ifdef ?quoted
Without this constraint, it will break 32-bit x86 builds. http://mails.dpdk.org/archives/test-report/2019-June/086586.htmlOK . #ifdef RTE_ARCH_64 would help then.
OK.
quoted
quoted
quoted
+/** + * 128-bit integer structure. + */ +RTE_STD_C11 +typedef struct { + RTE_STD_C11 + union { + uint64_t val[2]; + __extension__ __int128 int128;Instead of guarding RTE_ARCH_64 on this complete structure, How about it only under #ifdef RTE_ARCH_64 __extension__ __int128 int128; #endif So that it rte_int128_t will be available for 32bit as well.
Agree, it should be work. But I am not sure. Hi Gage, How do you think about this?
quoted
quoted
quoted
+ }; +} __rte_aligned(16) rte_int128_t; +#endif + #ifdef __DOXYGEN__