Re: [dpdk-dev] [PATCH v2 1/2] lib/eal: add amd epyc2 memcpy routine to eal
From: Thomas Monjalon <hidden>
Date: 2021-10-21 19:50:13
21/10/2021 21:03, Song, Keesang:
From: Thomas Monjalon <redacted>quoted
21/10/2021 20:12, Song, Keesang:quoted
From: Ananyev, Konstantin <redacted>quoted
21/10/2021 19:10, Song, Keesang:quoted
19/10/2021 17:35, Stephen Hemminger:quoted
From: Thomas Monjalon <redacted>quoted
19/10/2021 12:47, Aman Kumar:quoted
This patch provides rte_memcpy* calls optimized for AMD EPYC platforms. Use config/x86/x86_amd_epyc_linux_gcc as cross-file with meson to build dpdk for AMD EPYC platforms.Please split in 2 patches: platform & memcpy. What optimization is specific to EPYC? I dislike the asm code below. What is AMD specific inside? Can it use compiler intrinsics as it is done elsewhere?And why is this not done by Gcc?I hope this can make some explanation to your question. We(AMD Linux library support team) have implemented the custom tailored memcpy solution which is a close match with DPDK use case requirements like the below. 1) Min 64B length data packet with cache aligned Source and Destination. 2) Non-Temporal load and temporal store for cache aligned source for both RX and TX paths. Could not implement the non-temporal store for TX_PATH, as non-Temporal load/stores works only with 32B aligned addresses for AVX2 3) This solution works for all AVX2 supported AMD machines. Internally we have completed the integrity testing and benchmarking of the solution and found gains of 8.4% to 14.5% specifically on Milan CPU(3rd Gen of EPYC Processor)It still not clear to me why it has to be written in assembler. Why similar stuff can't be written in C with instincts, as rest of rte_memcpy.h does?The current memcpy implementation in Glibc is based out of assembly coding. Although memcpy could have been implemented with intrinsic, but since our AMD library developers are working on the Glibc functions, they have provided a tailored implementation based out of inline assembly coding.Please convert it to C code, thanks.I've already asked our AMD tools team, but they're saying they are not really familiar with C code implementation. We need your approval for now since we really need to get this patch submitted to 21.11 LTS.
Not sure it is urgent given that v2 came after the planned -rc1 date, after 6 weeks of silence. About the approval, there are already 3 technical board members (Konstantin, Stephen and me) objecting against this patch. Not being familiar with C code when working on CPU optimization in 2021 is a strange argument. In general, I don't really understand why we should maintain memcpy functions in DPDK instead of relying on libc optimizations. Having big asm code to maintain and debug is not helping. I think this case shows that AMD needs to become more familiar with DPDK schedule and expectations. I would encourage you to contribute more in the project, so such misunderstanding won't happen in future. Hope that's all understandable PS: discussion is more readable with replies below