Thread (43 messages) 43 messages, 9 authors, 2021-11-01

Re: [dpdk-dev] [PATCH v3 3/3] lib/eal: add temporal store memcpy support on AMD platform

From: Aman Kumar <hidden>
Date: 2021-10-27 06:35:09

On Tue, Oct 26, 2021 at 9:44 PM Thomas Monjalon [off-list ref] wrote:
26/10/2021 17:56, Aman Kumar:
quoted
This patch provides a rte_memcpy* call with temporal stores.
Use -Dcpu_instruction_set=znverX with build to enable this API.

Signed-off-by: Aman Kumar <redacted>
---
 config/x86/meson.build           |   2 +
 lib/eal/x86/include/rte_memcpy.h | 114 +++++++++++++++++++++++++++++++
It looks better as C code.
Do you achieve the same performance as the asm version?
In a few corner cases assembly performed better, but overall we have very
similar perf observations.
quoted
+#if defined RTE_MEMCPY_AMDEPYC
[...]
quoted
+static __rte_always_inline void *
+rte_memcpy_aligned_tstore16_generic(void *dst, void *src, int len)
So to be clear, an application will benefit of this optimization if
1/ DPDK is specifically compiled for AMD
2/ the application is compiled with above DPDK build (because of
inlinining)

I guess there is no good way to benefit from the optimization
without specific compilation, because of inlining constraint.
Another design, with less constraint but less performance,
would be to have a function pointer assigned at runtime based on the CPU.
You're right. We need to build DPDK and apps with this flag enabled to get
the benefit.
In future versions, we will try to adapt in a more dynamic way. Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help