Re: [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics
From: Ferruh Yigit <hidden>
Date: 2018-12-20 23:30:13
On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote:
quoted
-----Original Message----- From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur Sent: Tuesday, March 8, 2016 7:01 AM To: dev at dpdk.org Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics v1: This patch adds memcmp functionality using AVX and SSE intrinsics provided by Intel. For other architectures supported by DPDK regular memcmp function is used. Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA) systems.[...]quoted
+ if (unlikely(!_mm_testz_si128(xmm2, xmm2))) { + __m128i idx = + _mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);line over 80 characters ;)quoted
+ + /* + * Reverse byte order + */ + xmm0 = _mm_shuffle_epi8(xmm0, idx); + xmm1 = _mm_shuffle_epi8(xmm1, idx); + + /* + * Compare unsigned bytes with instructions for signed bytes + */ + xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80)); + xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80)); + + return _mm_movemask_epi8(xmm0 > xmm1) - _mm_movemask_epi8(xmm1 > xmm0); + } + + return 0; +}[...]quoted
+static inline int +rte_memcmp(const void *_src_1, const void *_src_2, size_t n) +{ + const uint8_t *src_1 = (const uint8_t *)_src_1; + const uint8_t *src_2 = (const uint8_t *)_src_2; + int ret = 0; + + if (n < 16) + return rte_memcmp_regular(src_1, src_2, n);[...]quoted
+ + while (n > 512) { + ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256);Thanks for the great work! Seems to me there's a big improvement area before going into detailed instruction layout tuning that -- No unalignment handling here for large size memcmp. So almost without a doubt the performance will be low in micro-architectures like Sandy Bridge if the start address is unaligned, which might be a common case.
Patch is waiting for comment for a long time, since 2016 May. Updating patch status as rejected. Anyone planning to work on vectorized version of rte_memcmp() can benefit from this patch: https://patches.dpdk.org/patch/11156/ https://patches.dpdk.org/patch/11157/