Re: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations
From: Ard Biesheuvel <hidden>
Date: 2018-09-26 14:02:40
Also in:
linux-arm-kernel, lkml, netdev
(+ Herbert, Thomas) On Wed, 26 Sep 2018 at 15:33, Jason A. Donenfeld [off-list ref] wrote:
Hi Ard, On Wed, Sep 26, 2018 at 10:59 AM Ard Biesheuvel [off-list ref] wrote:quoted
quoted
+static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, + const u8 *src, size_t len, + simd_context_t *simd_context) +{ +#if defined(CONFIG_KERNEL_MODE_NEON) + if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && + simd_use(simd_context)) + chacha20_neon(dst, src, len, state->key, state->counter); + else +#endifBetter to use IS_ENABLED() here:quoted
+ if (IS_ENABLED(CONFIG_KERNEL_MODE_NEON)) && + chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && + simd_use(simd_context))Good idea. I'll fix that up.quoted
Also, this still has unbounded worst case scheduling latency, given that the outer library function passes its entire input straight into the NEON routine.The vast majority of crypto routines in arch/*/crypto/ follow this same exact pattern, actually. I realize a few don't -- probably the ones you had a hand in :) -- but I think this is up to the caller to handle.
Anything that uses the scatterwalk API (AEADs and skciphers) will handle at most a page at a time. Hashes are different, which is why some of them have to handle it explicitly.
I made a change so that in chacha20poly1305.c, it calls simd_relax after handling each scatter-gather element, so a "construction" will handle this gracefully. But I believe it's up to the caller to decide on what sizes of information it wants to pass to primitives. Put differently, this also hasn't ever been an issue before -- the existing state of the tree indicates this -- and so I don't anticipate this will be a real issue now.
The state of the tree does not capture all relevant context or history. The scheduling latency issue was brought up very recently by the -rt folks on the mailing lists.
And if it becomes one, this is something we can address *later*, but certainly there's no use of adding additional complexity to the initial patchset to do this now.
You are introducing a very useful SIMD abstraction, but it lets code run with preemption disabled for unbounded amounts of time, and so now is the time to ensure we get it right. Part of the [justified] criticism on the current state of the crypto API is on its complexity, and so I don't think it makes sense to keep it simple now and add the complexity later (and the same concern applies to async support btw).