Thread (34 messages) 34 messages, 4 authors, 2025-06-19

Re: [RFC PATCH 2/2] x86: alternative: Invalidate the cache for updated instructions

From: Naresh Kamboju <hidden>
Date: 2025-06-12 16:24:17
Also in: lkml

On Thu, 12 Jun 2025 at 05:47, Masami Hiramatsu [off-list ref] wrote:
On Wed, 11 Jun 2025 13:30:01 +0200
Peter Zijlstra [off-list ref] wrote:
quoted
On Tue, Jun 10, 2025 at 11:47:48PM +0900, Masami Hiramatsu (Google) wrote:
quoted
From: Masami Hiramatsu (Google) <mhiramat@kernel.org>

Invalidate the cache after replacing INT3 with the new instruction.
This will prevent the other CPUs seeing the removed INT3 in their
cache after serializing the pipeline.

LKFT reported an oops by INT3 but there is no INT3 shown in the
dumped code. This means the INT3 is removed after the CPU hits
INT3.

 ## Test log
 ftrace-stress-test: <12>[   21.971153] /usr/local/bin/kirk[277]:
 starting test ftrace-stress-test (ftrace_stress_test.sh 90)
 <4>[   58.997439] Oops: int3: 0000 [#1] SMP PTI
 <4>[   58.998089] CPU: 0 UID: 0 PID: 323 Comm: sh Not tainted
 6.15.0-next-20250605 #1 PREEMPT(voluntary)
 <4>[   58.998152] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
 BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 <4>[   58.998260] RIP: 0010:_raw_spin_lock+0x5/0x50
 <4>[   58.998563] Code: 5d e9 ff 12 00 00 66 66 2e 0f 1f 84 00 00 00
 00 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3
 0f 1e fa 0f <1f> 44 00 00 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 15
 12 e4 fe

Maybe one possible scenario is to hit the int3 after the third step
somehow (on I-cache).

------
<CPU0>                                      <CPU1>
                                    Start smp_text_poke_batch_finish().
                                    Start the third step. (remove INT3)
                                    on_each_cpu(do_sync_core)
do_sync_core(do SERIALIZE)
                                    Finish the third step.
Hit INT3 (from I-cache)
                                    Clear text_poke_array_refs[cpu0]
Start smp_text_poke_int3_handler()
Failed to get text_poke_array_refs[cpu0]
Oops: int3
------

SERIALIZE instruction flashes pipeline, thus the processor needs
to reload the instruction. But it is not ensured to reload it from
memory because SERIALIZE does not invalidate the cache.

To prevent reloading replaced INT3, we need to invalidate the cache
(flush TLB) in the third step, before the do_sync_core().
This sounds all sorts of wrong. x86 is supposed to be cache-coherent. A
store should cause the invalidation per MESI and all that. This means
the only place where the old instruction can stick around is in the
uarch micro-ops cache and all that, and SERIALIZE will very much flush
those.
OK, thanks for pointing it out!
quoted
Also, TLB flush != I$ flush. There is clflush_cache_range() for this.
But still, this really should not be needed.

Also, this is all qemu, and qemu is known to have gotten this terribly
wrong in the past.
What about KVM? We need to ask Naresh how it is running on the machine.
Naresh, can you tell us how the VM is running? Does that use KVM?
And if so, how the kvm is configured(it may depend on the real hardware)?
We do not use KVM and are running the Qemu version (10.0.0).
quoted
If you all cannot reproduce on real hardware, I'm considering this a
qemu bug.
It is reproducible intermittently on x86_64 device and qemu-x86 device
with and without compat mode.

This link is showing how intermittent it is on Linux next tree.

 - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20250606/testrun/28685600/suite/log-parser-test/test/oops-oops-int3-smp-pti/history/?page=2

- Naresh
OK, if it is a qemu's bug, dropping [2/2], but I think we still need
[1/2] to avoid kernel crash (with a warning message without dump).

Thank you,
quoted

--
Masami Hiramatsu (Google) [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help