Thread (65 messages) 65 messages, 9 authors, 2018-09-25

Re: [PATCH net-next v5 00/20] WireGuard: Secure Network Tunnel

From: Ard Biesheuvel <hidden>
Date: 2018-09-19 17:21:25
Also in: linux-crypto, lkml

On 18 September 2018 at 14:01, Jason A. Donenfeld [off-list ref] wrote:
Hi Ard,

On Tue, Sep 18, 2018 at 11:28:50AM -0700, Ard Biesheuvel wrote:
quoted
On 18 September 2018 at 09:16, Jason A. Donenfeld [off-list ref] wrote:
quoted
  - While I initially wasn't going to do this for the initial
    patchset, it was just so simple to do: now there's a nosimd
    module parameter that can be used to disable simd instructions
    for debugging and testing, or on weird systems.
I was going to respond in the other thread but it is probably better
to move the discussion here.

My concern about the monolithic nature of each algo module is not only
about SIMD, and it has nothing to do with weird systems. It has to do
with micro-architectural differences which are more common on ARM than
on other architectures *, I suppose. But generalizing from that, it
has to do with policy which is currently owned by userland and not by
the kernel. This will also be important for choosing between the time
variant but less safe table based scalar AES and the much slower time
invariant version (which is substantially slower, especially on
decryption) once we move AES into this library.

So a command line option for the kernel is not the solution here. If
we can't have separate modules, could we at least have per-module
options that put the policy decisions back into userland?

* as an example, the SHA256 NEON code I collaborated on with Andy
Polyakov 2 years ago is significantly faster on some cores and not on
others
Interesting concern. There are micro-architectural quirks on x86 too
that the current code actually already considers. Notably, we use an
AVX-512VL path for Skylake-X but an AVX-512F path for Knights Landing
and Coffee Lake and others, due to thermal throttling when touching the
zmm registers on Skylake-X. So, in the code, we have it automatically
select the right thing based on the micro-architecture.

Is the same thing not possible with ARM? Do you not have access to this
information already, such that the module can just always do the right
thing and not require any user intervention?
That depends on what the right thing is. 'Fastest' does not
necessarily mean 'optimal', and I guess the thermal throttling on
Skylake-X may still result in the most power efficient implementation,
which may be the preferred one in some contexts.

The point is that this is a policy decision, and those belong in
userland not in the kernel.
If so, that would be ideal. If not (and I'm curious to learn why not
exactly), then indeed we could add some runtime nobs in /sys/module/
{algo}/parameters/{nob}, or the like. This would be super easy to do,
should we ever encounter a situation where we're unable to auto-detect
the correct thing.

Regards,
Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help