Thread (61 messages) 61 messages, 10 authors, 2019-10-01

RE: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet encryption

From: Pascal Van Leeuwen <hidden>
Date: 2019-09-27 10:12:01
Also in: linux-crypto

-----Original Message-----
From: Linus Torvalds <torvalds@linux-foundation.org>
Sent: Friday, September 27, 2019 4:06 AM
To: Pascal Van Leeuwen <redacted>
Cc: Ard Biesheuvel <redacted>; Linux Crypto Mailing List <linux-
crypto@vger.kernel.org>; Linux ARM [off-list ref]; Herbert Xu
[off-list ref]; David Miller [off-list ref]; Greg KH
[off-list ref]; Jason A . Donenfeld [off-list ref]; Samuel Neves
[off-list ref]; Dan Carpenter [off-list ref]; Arnd Bergmann
[off-list ref]; Eric Biggers [off-list ref]; Andy Lutomirski [off-list ref];
Will Deacon [off-list ref]; Marc Zyngier [off-list ref]; Catalin Marinas
[off-list ref]
Subject: Re: [RFC PATCH 18/18] net: wireguard - switch to crypto API for packet
encryption

On Thu, Sep 26, 2019 at 5:15 PM Pascal Van Leeuwen
[off-list ref] wrote:
quoted
But even the CPU only thing may have several implementations, of which
you want to select the fastest one supported by the _detected_ CPU
features (i.e. SSE, AES-NI, AVX, AVX512, NEON, etc. etc.)
Do you think this would still be efficient if that would be some
large if-else tree? Also, such a fixed implementation wouldn't scale.
Just a note on this part.

Yes, with retpoline a large if-else tree is actually *way* better for
performance these days than even just one single indirect call. I
think the cross-over point is somewhere around 20 if-statements.
Yikes, that is just _horrible_ :-(

_However_ there's many CPU architectures out there that _don't_ need
the retpoline mitigation and would be unfairly penalized by the deep
if-else tree (as opposed to the indirect branch) for a problem they
did not cause in the first place.

Wouldn't it be more fair to impose the penalty on the CPU's actually
_causing_ this problem? Also because those are generally the more 
powerful CPU's anyway, that would suffer the least from the overhead?
But those kinds of things also are things that we already handle well
with instruction rewriting, so they can actually have even less of an
overhead than a conditional branch. Using code like

  if (static_cpu_has(X86_FEATURE_AVX2))

actually ends up patching the code at run-time, so you end up having
just an unconditional branch. Exactly because CPU feature choices
often end up being in critical code-paths where you have
one-or-the-other kind of setup.

And yes, one of the big users of this is very much the crypto library code.
Ok, I didn't know about that. So I suppose we could have something
like if (static_soc_has(HW_CRYPTO_ACCELERATOR_XYZ)) ... Hmmm ...
The code to do the above is disgusting, and when you look at the
generated code you see odd unreachable jumps and what looks like a
slow "bts" instruction that does the testing dynamically.

And then the kernel instruction stream gets rewritten fairly early
during the boot depending on the actual CPU capabilities, and the
dynamic tests get overwritten by a direct jump.

Admittedly I don't think the arm64 people go to quite those lengths,
but it certainly wouldn't be impossible there either.  It just takes a
bit of architecture knowledge and a strong stomach ;)

                 Linus
Regards,
Pascal van Leeuwen
Silicon IP Architect, Multi-Protocol Engines @ Verimatrix
www.insidesecure.com
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help