[PATCH v2 11/28] arm64/sve: Core task context handling
From: Dave.Martin@arm.com (Dave Martin)
Date: 2017-09-14 19:40:41
Also in:
kvmarm, linux-arch
On Wed, Sep 13, 2017 at 03:21:29PM -0700, Catalin Marinas wrote:
On Wed, Sep 13, 2017 at 08:17:07PM +0100, Dave P Martin wrote:quoted
On Wed, Sep 13, 2017 at 10:26:05AM -0700, Catalin Marinas wrote:quoted
On Thu, Aug 31, 2017 at 06:00:43PM +0100, Dave P Martin wrote:quoted
+/* + * Trapped SVE access + */ +void do_sve_acc(unsigned int esr, struct pt_regs *regs) +{ + /* Even if we chose not to use SVE, the hardware could still trap: */ + if (unlikely(!system_supports_sve()) || WARN_ON(is_compat_task())) { + force_signal_inject(SIGILL, ILL_ILLOPC, regs, 0); + return; + } + + task_fpsimd_save(); + + sve_alloc(current); + fpsimd_to_sve(current); + if (test_and_set_thread_flag(TIF_SVE)) + WARN_ON(1); /* SVE access shouldn't have trapped */ + + task_fpsimd_load(); +}When this function is entered, do we expect TIF_SVE to always be cleared? It's worth adding a comment on the expected conditions. IfYes, and this is required for correctness, as you observe. I had a BUG_ON() here which I removed, but it makes sense to add a comment to capture the precondition here, and how it is satisfied.quoted
that's the case, task_fpsimd_save() would only save the FPSIMD state which is fine. However, you subsequently transfer the FPSIMD state to SVE, set TIF_SVE and restore the full SVE state. If we don't care about the SVE state here, can we call task_fpsimd_load() *before* setting TIF_SVE?There should be no way to reach this code with TIF_SVE set, unless task_fpsimd_load() sets the CPACR trap bit wrongly, or the hardware is broken -- either of which is a bug.Thanks for confirming my assumptions. What I meant was rewriting the above function as: /* reset the SVE state (other than FPSIMD) */ task_fpsimd_save(); task_fpsimd_load();
I think this works, but can you explain your rationale? I think the main effect of your suggestion is that it is cheaper, due to eliminating some unnecessary load/store operations. We could go one better, and do mov v0.16b, v0.16b mov v1.16b, v1.16b // ... mov v31.16b, v31.16b which doesn't require any memory access. But I still prefer to zero p0..p15, ffr for cleanliness, even though the SVE programmer's model doesn't require this (unlike for the Z-reg high bits where we do need to zero them in order not to violate the programmer's model). Currently sve_alloc()+task_fpsimd_load() ensures that all the non-FPSIMD regs are zeroed too, in addition to the Z-reg high bits. So we might want a special-purpose helper -- if so, we can do it all with no memory access. pfalse p0.b // .. pfalse p15.b wrffr p0.b This would allow the memset-zero an sve_alloc() to be removed, but I would need to check what other code is relying on it. I guess I hadn't done this because I viewed it as an optimisation. Thoughts? Cheers ---Dave