Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence
From: Zhang, Qi Z <hidden>
Date: 2021-07-08 14:26:21
-----Original Message----- From: Lance Richardson <redacted> Sent: Thursday, July 8, 2021 9:51 PM To: Zhang, Qi Z <redacted> Cc: Joyce Kong <redacted>; Xing, Beilei <redacted>; ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson, Bruce [off-list ref]; Zhang, Helin [off-list ref]; dev@dpdk.org; stable@dpdk.org; nd@arm.com Subject: Re: [dpdk-dev] [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence On Thu, Jul 8, 2021 at 8:09 AM Zhang, Qi Z [off-list ref] wrote:quoted
quoted
-----Original Message----- From: Joyce Kong <redacted> Sent: Tuesday, July 6, 2021 2:54 PM To: Xing, Beilei <redacted>; Zhang, Qi Z[off-list ref];quoted
quoted
ruifeng.wang@arm.com; honnappa.nagarahalli@arm.com; Richardson,Brucequoted
quoted
[off-list ref]; Zhang, Helin [off-list ref] Cc: dev@dpdk.org; stable@dpdk.org; nd@arm.com Subject: [PATCH v3 2/2] net/i40e: replace SMP barrier with thread fence Simply replace the SMP barrier with atomic thread fence for i40e hw ringsacn,quoted
quoted
if there is no synchronization point. Signed-off-by: Joyce Kong <redacted> Reviewed-by: Ruifeng Wang <redacted> --- drivers/net/i40e/i40e_rxtx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)diff --git a/drivers/net/i40e/i40e_rxtx.c b/drivers/net/i40e/i40e_rxtx.c index9aaabfd92..86e2f083e 100644--- a/drivers/net/i40e/i40e_rxtx.c +++ b/drivers/net/i40e/i40e_rxtx.c@@ -482,7 +482,8 @@ i40e_rx_scan_hw_ring(struct i40e_rx_queue *rxq)I40E_RXD_QW1_STATUS_SHIFT;quoted
quoted
} - rte_smp_rmb(); + /* This barrier is to order loads of different words in thedescriptor */quoted
quoted
+ rte_atomic_thread_fence(__ATOMIC_ACQUIRE);Now for x86, you actually replace a compiler barrier with a memory fence,this may have potential performance impact which need additional resource to investigate No memory fence instruction is generated for __ATOMIC_ACQUIRE on x86 for any version of gcc or clang that I've tried, based on experiments here: https://godbolt.org/z/Yxr1vGhKP
Nice tool! I try to write some dummy code combined with or without __atomic_thread_fence(__ATOMIC_ACQUIRE) but I didn't see any difference of the generated assembly code, does that means __atomic_thread_fence(__ATOMIC_ACQUIRE) just does nothing on x86?