Re: [RFC PATCH 0/2] x86: Fix missing core serialization on migration
From: Mathieu Desnoyers <hidden>
Date: 2017-11-14 16:48:42
Also in:
lkml
----- On Nov 14, 2017, at 11:08 AM, Peter Zijlstra peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org wrote:
On Tue, Nov 14, 2017 at 05:05:41PM +0100, Peter Zijlstra wrote:quoted
On Tue, Nov 14, 2017 at 03:17:12PM +0000, Mathieu Desnoyers wrote:quoted
I've tried to create a small single-threaded self-modifying loop in user-space to trigger a trace cache or speculative execution quirk, but I have not succeeded yet. I suspect that I would need to know more about the internals of the processor architecture to create the right stalls that would allow speculative execution to move further ahead, and trigger an incoherent execution flow. Ideas on how to trigger this would be welcome.I thought the whole problem was per definition multi-threaded. Single-threaded stuff can't get out of sync with itself; you'll always observe your own stores.And even if you could, you can always execute a local serializing instruction like CPUID to force things.
What I'm trying to reproduce is something that breaks in single-threaded case if I explicitly leave out the CPUID core serializing instruction when doing code modification on upcoming code, in a loop. AFAIU, Intel requires a core serializing instruction to be issued even in single-threaded scenarios between code update and execution, to ensure that speculative execution does not observe incoherent code. Now the question we all have for Intel is: is this requirement too strong, or required by reality ? Thanks, Mathieu
quoted
And ISTR the JIT scenario being something like the JIT overwriting previously executed but supposedly no longer used code. And in this scenario you'd want to guarantee all CPUs observe the new code before jumping into it. The current approach is using mprotect(), except that on a number of platforms the TLB invalidate from that is not guaranteed to be strong enough to sync for code changes. On x86 the mprotect() should work just fine, since we broadcast IPIs for the TLB invalidate and the IRET from those will get the things synced up again (if nothing else; very likely we'll have done a MOV-CR3 which will of course also have sufficient syncness on it). But PowerPC, s390, ARM et al that do TLB invalidates without interrupts and don't guarantee their TLB invalidate sync against execution units are left broken by this scheme.
-- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com