Thread (88 messages) 88 messages, 7 authors, 2019-10-21

Re: [dpdk-dev] [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare exchange

From: Phil Yang (Arm Technology China) <hidden>
Date: 2019-07-19 13:56:33

-----Original Message-----
From: Jerin Jacob Kollanukkaran <redacted>
Sent: Friday, July 19, 2019 8:35 PM
To: Phil Yang (Arm Technology China) <redacted>; dev@dpdk.org
Cc: thomas@monjalon.net; hemant.agrawal@nxp.com; Honnappa
Nagarahalli [off-list ref]; Gavin Hu (Arm Technology
China) [off-list ref]; nd [off-list ref]; gage.eads@intel.com; nd
[off-list ref]
Subject: RE: [EXT] [PATCH v3 1/3] eal/arm64: add 128-bit atomic compare
exchange
quoted
quoted
quoted
+#define RTE_HAS_ACQ(mo) ((mo) != __ATOMIC_RELAXED && (mo) !=
+__ATOMIC_RELEASE) #define RTE_HAS_RLS(mo) ((mo) ==
__ATOMIC_RELEASE || \
+			 (mo) == __ATOMIC_ACQ_REL || \
+			 (mo) == __ATOMIC_SEQ_CST)
+
+#define RTE_MO_LOAD(mo)  (RTE_HAS_ACQ((mo)) \
+		? __ATOMIC_ACQUIRE : __ATOMIC_RELAXED) #define
RTE_MO_STORE(mo)
+(RTE_HAS_RLS((mo)) \
+		? __ATOMIC_RELEASE : __ATOMIC_RELAXED)
+
The one starts with RTE_ are public symbols, If it is generic enough,
Move to common layer so that every architecturse can use.
If you think, otherwise make it internal
Let's keep it internal. I will remove the 'RTE_' tag.
Probably change to __HAS_ACQ to avoid collision(just in case)
OK.
quoted
quoted

quoted
+#ifdef __ARM_FEATURE_ATOMICS
This define is added in gcc 9.1 and I believe for clang it is not supported
yet.
quoted
quoted
So old gcc and clang this will be undefined.
I think, With meson + native build, we  can find the presence of
ATOMIC support by running a.out. Not sure about make and cross build
case.
quoted
quoted
I don't want block this feature because of this, IMO, We can add this
code with  existing __ARM_FEATURE_ATOMICS scheme and later find a
method to enhance it. But please check how to fix it.
OK.
After thinking on this a bit, I think,  in order to support old gcc(< gcc 9.1) and
clang,
We can introduce a config option, where, by default it is disabled and enable
In specific config(where we know, lse is supported) and meson config.

i.e
#if defined(__ARM_FEATURE_ATOMICS) ||
defined(RTE_ARM_FEATURE_ATOMICS)
Cool
quoted
quoted
quoted
+#define __ATOMIC128_CAS_OP(cas_op_name, op_string)
\
quoted
quoted
quoted
+static inline rte_int128_t                                                  \
+cas_op_name(rte_int128_t *dst, rte_int128_t old,                            \
+		rte_int128_t updated)                                               \
+{                                                                           \
+	/* caspX instructions register pair must start from even-numbered
+	 * register at operand 1.
+	 * So, specify registers for local variables here.
+	 */                                                                     \
+	register uint64_t x0 __asm("x0") = (uint64_t)old.val[0];                \
Since direct x0 register used in the code and
cas_op_name() and rte_atomic128_cmp_exchange() is inline function,
Based on parent function load, we may corrupt x0 register aka
Since x0/x1 and x2/x3 are used a lot and often contain live values.
Maybe to change them to some relatively less frequently used registers
like
quoted
x14/x15 and x16/x17 might help for this case?
According to the PCS (Procedure Call Standard), x14-x17 are also temporary
registers.
X14-x17 are temporary registers but since
cas_op_name() and rte_atomic128_cmp_exchange() are inline functions,
Based on the parent function register usage, it _may_ corrupt.
Just checked how Linux Kernel does similar things:
https://github.com/torvalds/linux/blob/master/arch/arm64/include/asm/atomic_lse.h#L19 

Same methods.

I will finish the benchmarking for the no_inline approach. If it has no significant performance loss, I think we can make it as no_inline.  
quoted
quoted
Break arm64 ABI. Not sure clobber list will help here or not?
In my understanding, for the register variable, if it contains a live value in
the
quoted
specified register, the compiler will move the live value into a free register.
Since x0~x3 are present in the input/output operands and x0/x1's value
needs to
quoted
be restored to the variable 'old' as a return value.
So I didn't add them into the clobber list.
OK
quoted
quoted
Making it as no_inline will help but not sure about the performance
impact.
quoted
quoted
May be you can check with compiler team.

We burned our hands with this scheme, see
5b40ec6b966260e0ff66a8a2c689664f75d6a0e6 ("mempool/octeontx2: fix
possible arm64 ABI break")

Probably we can choose a scheme for rc2 and adjust as when we have
complete clarity.
quoted
+#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_ARM64)
There is nothing specific to x86 and arm64 here, Can we remove this
#ifdef ?
quoted
Without this constraint, it will break 32-bit x86 builds.
http://mails.dpdk.org/archives/test-report/2019-June/086586.html
OK . #ifdef RTE_ARCH_64 would help then.
OK.
quoted
quoted
quoted
+/**
+ * 128-bit integer structure.
+ */
+RTE_STD_C11
+typedef struct {
+	RTE_STD_C11
+	union {
+		uint64_t val[2];
+		__extension__ __int128 int128;
Instead of guarding  RTE_ARCH_64 on this complete structure,
How about it only under
#ifdef RTE_ARCH_64
__extension__ __int128 int128;
#endif
So that it rte_int128_t will be available for 32bit as well.
Agree, it should be work. But I am not sure. 

Hi Gage,

How do you think about this? 
quoted
quoted
quoted
+	};
+} __rte_aligned(16) rte_int128_t;
+#endif
+
 #ifdef __DOXYGEN__
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help