Re: [PATCH RT] arm*: disable NEON in kernel mode
From: Dave Martin <Dave.Martin@arm.com>
Date: 2017-12-01 17:58:51
Also in:
linux-arm-kernel, lkml
On Fri, Dec 01, 2017 at 03:03:35PM +0000, Ard Biesheuvel wrote:
lquoted
On 1 Dec 2017, at 14:36, Sebastian Andrzej Siewior [off-list ref] wrote:quoted
On 2017-12-01 14:18:28 [+0000], Mark Rutland wrote: [Adding Ard, who wrote the NEON crypto code]quoted
On Fri, Dec 01, 2017 at 02:45:06PM +0100, Sebastian Andrzej Siewior wrote: +arm folks, to let you knowquoted
On 2017-12-01 11:43:32 [+0100], To linux-rt-users@vger.kernel.org wrote: NEON in kernel mode is used by the crypto algorithms and raid6 code. While the raid6 code looks okay, the crypto algorithms do not: NEON is enabled on first invocation and may allocate/free/map memory before the NEON mode is disabled again.Could you elaborate on why this is a problem? I guess this is because kernel_neon_{begin,end}() disable preemption? ... is this specific to RT?It is RT specific, yes. One thing are the unbounded latencies since everything in this preempt_disable section can take time depending on the size of the request. The other thing is code like in arch/arm64/crypto/aes-ce-ccm-glue.c:ccm_encrypt() where within this preempt_disable() section skcipher_walk_done() is invoked. That function can allocate/free/map memory which is okay for !RT but is not for RT. I tried to break those loops for x86 [0] and I simply didn't had the time to do the same for ARM. I am aware that store/restore of the NEON registers (as SSE and AVX) is expensive and doing a lot of operations in one go is desired.I wouldn’t mind fixing the code instead. We never disable the neon, but only stack the contents of the registers the first time, and unstack them only before returning to userland (with the exception of nested neon use in softirq context). When this code was introduced, we always stacked/unstacked the whole register file eagerly every time.
+1, at least for arm64. I don't see a really compelling reason for holding kernel-mode NEON around memory management now that we have a strict save-once-restore-lazily model. This may not work so well for arm though -- I haven't looked at that code for a while. If there is memory manamement in any core loop, you already lost the performance battle, and an extra kernel_neon_end()+kernel_neon_begin() may not be that catastrophic. [...] Cheers ---Dave