Thread (13 messages) 13 messages, 4 authors, 2018-12-20

Re: [PATCH v1 1/2] rte_memcmp functions using Intel AVX and SSE intrinsics

From: Ferruh Yigit <hidden>
Date: 2018-12-20 23:30:13

On 5/26/2016 9:57 AM, zhihong.wang at intel.com (Wang, Zhihong) wrote:
quoted
-----Original Message-----
From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Ravi Kerur
Sent: Tuesday, March 8, 2016 7:01 AM
To: dev at dpdk.org
Subject: [dpdk-dev] [PATCH v1 1/2] rte_memcmp functions using Intel AVX and
SSE intrinsics

v1:
        This patch adds memcmp functionality using AVX and SSE
        intrinsics provided by Intel. For other architectures
        supported by DPDK regular memcmp function is used.

        Compiled and tested on Ubuntu 14.04(non-NUMA) and 15.10(NUMA)
        systems.
[...]
quoted
+	if (unlikely(!_mm_testz_si128(xmm2, xmm2))) {
+		__m128i idx =
+			_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0);
line over 80 characters ;)
quoted
+
+		/*
+		 * Reverse byte order
+		 */
+		xmm0 = _mm_shuffle_epi8(xmm0, idx);
+		xmm1 = _mm_shuffle_epi8(xmm1, idx);
+
+		/*
+		* Compare unsigned bytes with instructions for signed bytes
+		*/
+		xmm0 = _mm_xor_si128(xmm0, _mm_set1_epi8(0x80));
+		xmm1 = _mm_xor_si128(xmm1, _mm_set1_epi8(0x80));
+
+		return _mm_movemask_epi8(xmm0 > xmm1) -
_mm_movemask_epi8(xmm1 > xmm0);
+	}
+
+	return 0;
+}
[...]
quoted
+static inline int
+rte_memcmp(const void *_src_1, const void *_src_2, size_t n)
+{
+	const uint8_t *src_1 = (const uint8_t *)_src_1;
+	const uint8_t *src_2 = (const uint8_t *)_src_2;
+	int ret = 0;
+
+	if (n < 16)
+		return rte_memcmp_regular(src_1, src_2, n);
[...]
quoted
+
+	while (n > 512) {
+		ret = rte_cmp256(src_1 + 0 * 256, src_2 + 0 * 256);
Thanks for the great work!

Seems to me there's a big improvement area before going into detailed
instruction layout tuning that -- No unalignment handling here for large
size memcmp.

So almost without a doubt the performance will be low in micro-architectures
like Sandy Bridge if the start address is unaligned, which might be a
common case.
Patch is waiting for comment for a long time, since 2016 May. Updating patch
status as rejected.

Anyone planning to work on vectorized version of rte_memcmp() can benefit from
this patch:
https://patches.dpdk.org/patch/11156/
https://patches.dpdk.org/patch/11157/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help