Thread (35 messages) 35 messages, 5 authors, 2025-03-18

Re: [PATCH net 03/24] crypto: Add 'krb5enc' hash and cipher AEAD algorithm

From: Eric Biggers <ebiggers@kernel.org>
Date: 2025-02-09 19:05:27
Also in: linux-crypto, linux-fsdevel, linux-nfs, lkml

On Sun, Feb 09, 2025 at 06:37:27PM +0000, David Howells wrote:
One of the issues I have with doing it on the CPU is that you have to do two
operations and, currently, they're done synchronously and serially.

Can you implement "auth5enc(hmac(sha256),cts(cbc(aes)))" in assembly and
actually make the assembly do both the AES and SHA at the same time?  It looks
like it *might* be possible - but that you might be an XMM register short of
being able to do it:-/
Yes, that would be the proper way to optimize that algorithm.  Someone just
needs to do it.  (And presumably you want this one and not Camellia which you
are also pushing for some reason?)
quoted
I don't see why off-CPU hardware offload support should deserve much
attention here, given the extremely high speed of on-CPU crypto these days
and the great difficulty of integrating off-CPU acceleration efficiently.
In particular it seems weird to consider Intel QAT a reasonable thing to use
over VAES.
Because some modern CPUs come with on-die crypto offload - and that can do
hash+encrypt or encrypt+hash in parallel.  Now, there are a couple of issues
with using the QAT here:

 (1) It doesn't support CTS.  This means we'd have to impose the CTS from
     above - and that may well make it unusable in doing hash + encrypt
     simultaneously.

 (2) It really needs batching to make it cheap enough to use.  This might
     actually be less of a problem - at least for rxgk.  The data is split up
     into fixed-size packets, but for a large amount of data we can end up
     filling packets faster than we can transmit them.  This offers the
     opportunity to batch them - up to ~8192 packets in a single batch.

For NFS, things are a bit different.  Because that mostly uses a streaming
transport these days, it wants to prepare a single huge message in one go -
and being able to parallellise the encrypt and the hash could be a benefit.
Right, the batching is always a huge issue for those types of accelerators.  A
much more promising approach is to just fully take advantage of the CPU
instructions that already accelerate the same algorithms very well.

- Eric
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help