Thread (34 messages) 34 messages, 4 authors, 2025-06-19

Re: [RFC PATCH 2/2] x86: alternative: Invalidate the cache for updated instructions

From: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Date: 2025-06-11 00:21:14
Also in: lkml

On Tue, 10 Jun 2025 11:50:30 -0400
Steven Rostedt [off-list ref] wrote:
On Tue, 10 Jun 2025 23:47:48 +0900
"Masami Hiramatsu (Google)" [off-list ref] wrote:
quoted
Maybe one possible scenario is to hit the int3 after the third step
somehow (on I-cache).

------
<CPU0>					<CPU1>
					Start smp_text_poke_batch_finish().
					Start the third step. (remove INT3)
					on_each_cpu(do_sync_core)
do_sync_core(do SERIALIZE)
					Finish the third step.
Hit INT3 (from I-cache)
					Clear text_poke_array_refs[cpu0]
Start smp_text_poke_int3_handler()
I believe your analysis is the issue here. The commit that changed the ref
counter from a global to per cpu didn't cause the issue, it just made the
race window bigger.
Agreed. That is a suspicious commit, but even though, as you said
it might just cause the bug easier. Here I wrote refcount as a
per-cpu array because of showing the current code.
quoted
Failed to get text_poke_array_refs[cpu0]
Oops: int3
------

SERIALIZE instruction flashes pipeline, thus the processor needs
to reload the instruction. But it is not ensured to reload it from
memory because SERIALIZE does not invalidate the cache.

To prevent reloading replaced INT3, we need to invalidate the cache
(flush TLB) in the third step, before the do_sync_core().

Reported-by: Linux Kernel Functional Testing <redacted>
Closes: https://lore.kernel.org/all/CA+G9fYsLu0roY3DV=tKyqP7FEKbOEETRvTDhnpPxJGbA=Cg+4w@mail.gmail.com/ (local)
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
---
 arch/x86/kernel/alternative.c |   10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index ecfe7b497cad..1b606db48017 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -2949,8 +2949,16 @@ void smp_text_poke_batch_finish(void)
 		do_sync++;
 	}
 
-	if (do_sync)
+	if (do_sync) {
+		/*
+		 * Flush the instructions on the cache, then serialize the
+		 * pipeline of each CPU.
The IPI interrupt should flush the cache. And the TLB should not be an
issue here. If anything, this may work just because it will make the race
smaller. 
I'm not sure, I'm searching it in the Intel SDM.
I'm thinking this may be a QEMU bug. If QEMU doesn't flush the icache on an
IPI then this would indeed be an problem.
Does the qemu manage its icache? (Is that possible to manage it?)
And I guess it is using KVM to run VM, thus the actual cache or TLB
operation has been done by KVM.

Thanks,
-- Steve

quoted
+		 */
+		flush_tlb_kernel_range((unsigned long)text_poke_addr(&text_poke_array.vec[0]),
+				       (unsigned long)text_poke_addr(text_poke_array.vec +
+								text_poke_array.nr_entries - 1));
 		smp_text_poke_sync_each_cpu();
+	}
 
 	/*
 	 * Remove and wait for refs to be zero.

-- 
Masami Hiramatsu (Google) [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help