Thread (12 messages) 12 messages, 4 authors, 2021-12-15

RE: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way interleave of aes and ghash

From: Xiaokang Qian <hidden>
Date: 2021-12-14 01:40:07
Also in: linux-arm-kernel, lkml

Hi Will:
I will post the update version 2 of this patch today or tomorrow.
Sorry for the delay.
-----Original Message-----
From: Will Deacon <will@kernel.org>
Sent: Tuesday, December 14, 2021 2:29 AM
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Eric Biggers <ebiggers@kernel.org>; Xiaokang Qian
[off-list ref]; Herbert Xu [off-list ref];
David S. Miller [off-list ref]; Catalin Marinas
[off-list ref]; nd [off-list ref]; Linux Crypto Mailing List
[off-list ref]; Linux ARM <linux-arm-
kernel@lists.infradead.org>; Linux Kernel Mailing List <linux-
kernel@vger.kernel.org>
Subject: Re: [PATCH] crypto: arm64/gcm-ce - unroll factors to 4-way
interleave of aes and ghash

On Tue, Sep 28, 2021 at 11:04:03PM +0200, Ard Biesheuvel wrote:
quoted
On Tue, 28 Sept 2021 at 08:27, Eric Biggers [off-list ref] wrote:
quoted
On Thu, Sep 23, 2021 at 06:30:25AM +0000, XiaokangQian wrote:
quoted
To improve performance on cores with deep piplines such as A72,N1,
implement gcm(aes) using a 4-way interleave of aes and ghash
(totally
8 blocks in parallel), which can make full utilize of pipelines
rather than the 4-way interleave we used currently. It can gain
about 20% for big data sizes such that 8k.

This is a complete new version of the GCM part of the combined
GCM/GHASH driver, it will co-exist with the old driver, only serve
for big data sizes. Instead of interleaving four invocations of
AES where each chunk of 64 bytes is encrypted first and then
ghashed, the new version uses a more coarse grained approach where
a chunk of 64 bytes is encrypted and at the same time, one chunk
of 64 bytes is ghashed (or ghashed and decrypted in the converse case).

The table below compares the performance of the old driver and the
new one on various micro-architectures and running in various
modes with various data sizes.

            |     AES-128       |     AES-192       |     AES-256       |
     #bytes | 1024 | 1420 |  8k | 1024 | 1420 |  8k | 1024 | 1420 |  8k |
     -------+------+------+-----+------+------+-----+------+------+-----+
        A72 | 5.5% |  12% | 25% | 2.2% |  9.5%|  23%| -1%  |  6.7%| 19% |
        A57 |-0.5% |  9.3%| 32% | -3%  |  6.3%|  26%| -6%  |  3.3%| 21% |
        N1  | 0.4% |  7.6%|24.5%| -2%  |  5%  |  22%| -4%  |
2.7%| 20% |

Signed-off-by: XiaokangQian <redacted>
Does this pass the self-tests, including the fuzz tests which are
enabled by CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y?
Please test both little-endian and big-endian. (Note that you don't
need a big-endian user space for this - the self tests are executed
before the rootfs is mounted)

Also, you will have to rebase this onto the latest cryptodev tree,
which carries some changes I made recently to this driver.
XiaokangQian -- did you post an updated version of this? It would end up
going via Herbert, but I was keeping half an eye on it and it all seems to have
gone quiet.

Thanks,

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help