Re: [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
From: Joyce Kong <hidden>
Date: 2021-07-28 07:07:33
-----Original Message----- From: Andrew Rybchenko <redacted> Sent: Saturday, July 24, 2021 3:52 AM To: Joyce Kong <redacted>; thomas@monjalon.net; david.marchand@redhat.com; roretzla@linux.microsoft.com; stephen@networkplumber.org; olivier.matz@6wind.com; harry.van.haaren@intel.com; Honnappa Nagarahalli [off-list ref]; Ruifeng Wang [off-list ref] Cc: dev@dpdk.org; nd <redacted> Subject: Re: [PATCH v3 8/8] test/rcu: use compiler atomics for data sync On 7/20/21 6:51 AM, Joyce Kong wrote:quoted
Covert rte_atomic usages to compiler atomic built-ins in rcu_perf testcases. Signed-off-by: Joyce Kong <redacted> Reviewed-by: Ruifeng Wang <redacted> Acked-by: Stephen Hemminger <stephen@networkplumber.org> --- app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------ 1 file changed, 49 insertions(+), 49 deletions(-)diff --git a/app/test/test_rcu_qsbr_perf.cb/app/test/test_rcu_qsbr_perf.c index 3017e71120..cf7b158d22 100644--- a/app/test/test_rcu_qsbr_perf.c +++ b/app/test/test_rcu_qsbr_perf.c@@ -30,8 +30,8 @@ static volatile uint32_t thr_id; static struct rte_rcu_qsbr *t[RTE_MAX_LCORE]; static struct rte_hash *h; static char hash_name[8]; -static rte_atomic64_t updates, checks; -static rte_atomic64_tupdate_cycles, check_cycles; +static uint64_t updates, checks; +static uint64_t update_cycles, check_cycles; /* Scale down results to 1000 operations to support lower * granularity clocks.@@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg) } cycles = rte_rdtsc_precise() - begin; - rte_atomic64_add(&update_cycles, cycles); - rte_atomic64_add(&updates, loop_cnt); + __atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED); + __atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);Shouldn't __atomic_add_fetch() be used instead since it pseudo-code is a bit simpler. What is the best option if return value is not actually used?
If the return value is not used, like the situations here, the instructions for __atomic_fetch_add() and __atomic_add_fetch() would be the same on X86 and Arm for gcc and clang that I have tried.
If the return value is used, __atomic_add_fetch() would do two more instructions('mov' 'add') than __atomic_fetch_add() to return the calculation result.
Based on experiments here: https://godbolt.org/ .