[PATCH v4 00/21] arm64: Move kernel mode FPSIMD buffer to the stack
From: Ard Biesheuvel <hidden>
Date: 2025-10-31 10:39:47
Also in:
linux-crypto, lkml
From: Ard Biesheuvel <ardb@kernel.org> Move the buffer for preserving/restoring the kernel mode FPSIMD state on a context switch out of struct thread_struct, and onto the stack, so that the memory cost is not imposed needlessly on all tasks in the system. Changes since v3: - Fix sloppy editing errors in ARM CRC code. - Add more comments about the state argument to kernel_neon_begin/end and the associated field in thread_struct - Fix bug in generic kernel mode FPU change - Add Rbs from Eric Changes since v2: - Fix generic kernel mode FPU api instead of removing it. - Rebase onto v6.18-rc0 and fix the fallout - Prefer WARN() over BUG() in kernel_neon_begin/end - Avoid unnecessary cmpxchg() calls - When invoked in softirq context, use the caller provided buffer rather than the one stored in the task struct - this permits callers from task context (including users of the generic kernel mode FPU api) to pass NULL as the buffer when running with preemption disabled. - Add acks from Kees and Eric; Mark's was dropped along with the patch in question. - Fix new occurrence of kernel_neon_begin/end in Mellanox driver. Changes since v1: - Add a patch reverting the arm64 support for the generic kernel_fpu_begin()/end() API, which is problematic on arm64. - Introduce a new 'ksimd' scoped guard that encapsulates the calls the kernel_neon_begin() and kernel_neon_end() at a higher level of abstraction. This makes it straight-forward to plumb in the stack buffer without complicating the callers. - Move all kernel mode NEON users on arm64 (and some on ARM) over to the new API. - Add Mark's ack to patches #6 - #8 Cc: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Kees Cook <redacted> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Mark Brown <broonie@kernel.org> Cc: Eric Biggers <ebiggers@kernel.org> Ard Biesheuvel (21): crypto/arm64: aes-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-ccm - Avoid pointless yield of the NEON unit crypto/arm64: sm4-ce-gcm - Avoid pointless yield of the NEON unit arm64/simd: Add scoped guard API for kernel mode SIMD ARM/simd: Add scoped guard API for kernel mode SIMD crypto: aegis128-neon - Move to more abstract 'ksimd' guard API raid6: Move to more abstract 'ksimd' guard API lib/crc: Switch ARM and arm64 to 'ksimd' scoped guard API lib/crypto: Switch ARM and arm64 to 'ksimd' scoped guard API crypto/arm64: aes-ccm - Switch to 'ksimd' scoped guard API crypto/arm64: aes-blk - Switch to 'ksimd' scoped guard API crypto/arm64: aes-gcm - Switch to 'ksimd' scoped guard API crypto/arm64: nhpoly1305 - Switch to 'ksimd' scoped guard API crypto/arm64: polyval - Switch to 'ksimd' scoped guard API crypto/arm64: sha3 - Switch to 'ksimd' scoped guard API crypto/arm64: sm3 - Switch to 'ksimd' scoped guard API crypto/arm64: sm4 - Switch to 'ksimd' scoped guard API arm64/xorblocks: Switch to 'ksimd' scoped guard API net/mlx5: Switch to more abstract scoped ksimd guard API on arm64 arm64/fpu: Enforce task-context only for generic kernel mode FPU arm64/fpsimd: Allocate kernel mode FP/SIMD buffers on the stack arch/arm/include/asm/simd.h | 7 + arch/arm64/crypto/aes-ce-ccm-glue.c | 116 +++++------ arch/arm64/crypto/aes-ce-glue.c | 87 ++++---- arch/arm64/crypto/aes-glue.c | 139 ++++++------- arch/arm64/crypto/aes-neonbs-glue.c | 150 +++++++------- arch/arm64/crypto/ghash-ce-glue.c | 27 ++- arch/arm64/crypto/nhpoly1305-neon-glue.c | 5 +- arch/arm64/crypto/polyval-ce-glue.c | 12 +- arch/arm64/crypto/sha3-ce-glue.c | 10 +- arch/arm64/crypto/sm3-ce-glue.c | 15 +- arch/arm64/crypto/sm3-neon-glue.c | 16 +- arch/arm64/crypto/sm4-ce-ccm-glue.c | 49 ++--- arch/arm64/crypto/sm4-ce-cipher-glue.c | 10 +- arch/arm64/crypto/sm4-ce-gcm-glue.c | 62 ++---- arch/arm64/crypto/sm4-ce-glue.c | 214 +++++++++----------- arch/arm64/crypto/sm4-neon-glue.c | 25 +-- arch/arm64/include/asm/fpu.h | 16 +- arch/arm64/include/asm/neon.h | 4 +- arch/arm64/include/asm/processor.h | 7 +- arch/arm64/include/asm/simd.h | 10 + arch/arm64/include/asm/xor.h | 22 +- arch/arm64/kernel/fpsimd.c | 53 +++-- crypto/aegis128-neon.c | 33 ++- drivers/net/ethernet/mellanox/mlx5/core/wc.c | 19 +- lib/crc/arm/crc-t10dif.h | 16 +- lib/crc/arm/crc32.h | 11 +- lib/crc/arm64/crc-t10dif.h | 16 +- lib/crc/arm64/crc32.h | 16 +- lib/crypto/arm/chacha.h | 6 +- lib/crypto/arm/poly1305.h | 6 +- lib/crypto/arm/sha1.h | 13 +- lib/crypto/arm/sha256.h | 12 +- lib/crypto/arm/sha512.h | 5 +- lib/crypto/arm64/chacha.h | 11 +- lib/crypto/arm64/poly1305.h | 6 +- lib/crypto/arm64/sha1.h | 7 +- lib/crypto/arm64/sha256.h | 19 +- lib/crypto/arm64/sha512.h | 8 +- lib/raid6/neon.c | 17 +- lib/raid6/recov_neon.c | 15 +- 40 files changed, 601 insertions(+), 691 deletions(-) base-commit: 3a8660878839faadb4f1a6dd72c3179c1df56787 -- 2.51.1.930.gacf6e81ea2-goog