--- v3
+++ v7
@@ -6,12 +6,30 @@
memcmp() can align them and go with .Llong comparision mode without
fallback to .Lshort comparision mode do compare buffer byte by byte.
(2) VMX instructions can be used to speed up for large size comparision,
-currently the threshold is set for 4K bytes.
+currently the threshold is set for 4K bytes. Notes the VMX instructions
+will lead to VMX regs save/load penalty. This patch set includes a
+patch to add a 32 bytes pre-checking to minimize the penalty.
-glibc commit dec4a7105e (powerpc: Improve memcmp performance for POWER8)
-did the similar. Thanks Cyril Bur's information.
+It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp
+performance for POWER8). Thanks Cyril Bur's information.
This patch set also updates memcmp selftest case to make it compiled and
incorporate large size comparison case.
+
+v6 -> v7:
+- add vcmpequd/vcmpequdb .long macro
+- add CPU_FTR pair so that Power7 won't invoke Altivec instrs.
+- rework some instructions for higher performance or more readable.
+
+v5 -> v6:
+- correct some comments/commit messsage.
+- rename VMX_OPS_THRES to VMX_THRESH
+
+v4 -> v5:
+- Expand 32 bytes prechk to src/dst different offset case, and remove
+KSM specific label/comment.
+
+v3 -> v4:
+- Add 32 bytes pre-checking before using VMX instructions.
v2 -> v3:
- add optimization for src/dst with different offset against 8 bytes
@@ -29,22 +47,28 @@
- add powerpc/64 to subject/commit message.
-Simon Guo (3):
+Simon Guo (5):
powerpc/64: Align bytes before fall back to .Lshort in powerpc64
- memcmp().
+ memcmp()
+ powerpc: add vcmpequd/vcmpequb ppc instruction macro
powerpc/64: enhance memcmp() with VMX instruction for long bytes
comparision
+ powerpc/64: add 32 bytes prechecking before using VMX optimization on
+ memcmp()
powerpc:selftest update memcmp_64 selftest for VMX implementation
arch/powerpc/include/asm/asm-prototypes.h | 4 +-
+ arch/powerpc/include/asm/ppc-opcode.h | 11 +
arch/powerpc/lib/copypage_power7.S | 4 +-
- arch/powerpc/lib/memcmp_64.S | 374 ++++++++++++++++++++-
+ arch/powerpc/lib/memcmp_64.S | 412 ++++++++++++++++++++-
arch/powerpc/lib/memcpy_power7.S | 6 +-
arch/powerpc/lib/vmx-helper.c | 4 +-
.../selftests/powerpc/copyloops/asm/ppc_asm.h | 4 +-
- .../selftests/powerpc/stringloops/asm/ppc_asm.h | 22 ++
- .../testing/selftests/powerpc/stringloops/memcmp.c | 98 ++++--
- 8 files changed, 476 insertions(+), 40 deletions(-)
+ .../selftests/powerpc/stringloops/asm/ppc-opcode.h | 39 ++
+ .../selftests/powerpc/stringloops/asm/ppc_asm.h | 24 ++
+ .../testing/selftests/powerpc/stringloops/memcmp.c | 98 +++--
+ 10 files changed, 566 insertions(+), 40 deletions(-)
+ create mode 100644 tools/testing/selftests/powerpc/stringloops/asm/ppc-opcode.h
--
1.8.3.1