Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save
From: Michael Ellerman <mpe@ellerman.id.au>
Date: 2023-11-20 23:40:51
Also in:
regressions
Timothy Pearson [off-list ref] writes:
----- Original Message -----quoted
From: "Michael Ellerman" <mpe@ellerman.id.au>
...
quoted
But we now have a new path, because io-uring can call copy_process() via create_io_thread() from the signal handling path. That's OK if the signal is handled as we return from a syscall, but it's not OK if the signal is handled due to some other interrupt. Which is: interrupt_return_srr_user() interrupt_exit_user_prepare() interrupt_exit_user_prepare_main() do_notify_resume() get_signal() task_work_run() create_worker_cb() create_io_worker() copy_process() dup_task_struct() arch_dup_task_struct() flush_all_to_thread() save_all() if (tsk->thread.regs->msr & MSR_FP) save_fpu() # fr0 is clobbered and potentially live in userspace So tldr I think the corruption is only an issue since io-uring started doing the clone via signal, which I think matches the observed timeline of this bug appearing.I agree the corruption really only started showing up in earnest on io_uring clone-via-signal, as this was confirmed several times in the course of debugging.
Thanks.
Note as well that I may very well have a wrong call order in the commit message, since I was relying on a couple of WARN_ON() macros I inserted to check for a similar (but not identical) condition and didn't spend much time getting new traces after identifying the root cause.
Yep no worries. I'll reword it to incorporate the full path from my mail.
I went back and grabbed some real world system-wide stack traces, since I now know what to trigger on. A typical example is:
interrupt_return_srr_user()
interrupt_exit_user_prepare()
interrupt_exit_user_prepare_main()
schedule()
__schedule()
__switch_to()
giveup_all()
# tsk->thread.regs->msr MSR_FP is still set here
__giveup_fpu()
save_fpu()
# fr0 is clobbered and potentially live in userspace
fr0 is not live there.
__giveup_fpu() does roughly:
msr = tsk->thread.regs->msr;
msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
msr &= ~MSR_VSX;
tsk->thread.regs = msr;
ie. it clears the FP etc. bits from the task's MSR. That means the FP
state will be reloaded from the thread struct before the task is run again.
Also on that path we're switching to another task, so we'll be reloading
the other task's FP state before returning to userspace.
So I don't see any bug there.
There's only two places that call save_fpu() and skip the giveup logic,
which is save_all() and kvmppc_save_user_regs().
save_all() is only called via clone() so I think that's unable to
actually cause visible register corruption as I described in my previous
mail.
I thought the KVM case was similar, as it's called via an ioctl, but
I'll have to talk to Nick as his mail indicates otherwise.
cheers