Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation | linux-crypto

Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation

From: Tianjia Zhang <hidden>
Date: 2021-12-21 07:39:37
Also in: linux-arm-kernel, lkml


On 12/21/21 2:03 AM, Jussi Kivilinna wrote:

On 20.12.2021 10.22, Tianjia Zhang wrote:

quoted

This patch adds AVX assembly accelerated implementation of SM3 secure
hash algorithm. From the benchmark data, compared to pure software
implementation sm3-generic, the performance increase is up to 38%.

The main algorithm implementation based on SM3 AES/BMI2 accelerated
work by libgcrypt at:
https://gnupg.org/software/libgcrypt/index.html

Benchmark on Intel i5-6200U 2.30GHz, performance data of two
implementations, pure software sm3-generic and sm3-avx acceleration.
The data comes from the 326 mode and 422 mode of tcrypt. The abscissas
are different lengths of per update. The data is tabulated and the
unit is Mb/s:

update-size |     16      64     256    1024    2048    4096    8192
--------------------------------------------------------------------
sm3-generic | 105.97  129.60  182.12  189.62  188.06  193.66  194.88
sm3-avx     | 119.87  163.05  244.44  260.92  257.60  264.87  265.88

Signed-off-by: Tianjia Zhang <redacted>
---
  arch/x86/crypto/Makefile         |   3 +
  arch/x86/crypto/sm3-avx-asm_64.S | 521 +++++++++++++++++++++++++++++++
  arch/x86/crypto/sm3_avx_glue.c   | 134 ++++++++
  crypto/Kconfig                   |  13 +
  4 files changed, 671 insertions(+)
  create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S
  create mode 100644 arch/x86/crypto/sm3_avx_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index f307c93fc90a..7cbe860f6201 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile

@@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o

nhpoly1305-avx2-glue.o
  obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o
+obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o
+sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o
+
  obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o
  sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o

diff --git a/arch/x86/crypto/sm3-avx-asm_64.S

b/arch/x86/crypto/sm3-avx-asm_64.S
new file mode 100644
index 000000000000..e7a9a37f3609

--- /dev/null
+++ b/arch/x86/crypto/sm3-avx-asm_64.S

@@ -0,0 +1,521 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * SM3 AVX accelerated transform.
+ * specified in:

https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02
+ *
+ * Copyright (C) 2021 Jussi Kivilinna [off-list ref]
+ * Copyright (C) 2021 Tianjia Zhang [off-list ref]
+ */

<snip>

quoted

+
+#define R(i, a, b, c, d, e, f, g, h, round, widx, 
wtype)                      \
+    /* rol(a, 12) => t0 
*/                                                \
+    roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on 
zen3 */ \
+    /* rol (t0 + e + t), 7) => t1 
*/                                      \
+    addl3(t0, e, 
t1);                                                     \
+    addl $K##round, 
t1;                                                   \

It's better to use "leal K##round(t0, e, 1), t1;" here and fix K0-K63 
macros
instead as I noted at libgcrypt mailing-list:
  https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/005209.html

-Jussi

Thanks for pointing it out, I will fix it in the next patch.

Best regards,
Tianjia

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help