Thread (47 messages) 47 messages, 7 authors, 2026-04-09

Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: 2026-04-03 23:16:16
Also in: bpf, kvm, linux-doc, lkml

On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
[off-list ref] wrote:
quoted
On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
quoted
On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
[off-list ref] wrote:
quoted
On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
quoted
On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
[off-list ref] wrote:
quoted
As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
the Branch History Buffer (BHB). On Alder Lake and newer parts this
sequence is not sufficient because it doesn't clear enough entries. This
was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
in the kernel.

Now with VMSCAPE (BHI variant) it is also required to isolate branch
history between guests and userspace. Since BHI_DIS_S only protects the
kernel, the newer CPUs also use IBPB.

A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
But it currently does not clear enough BHB entries to be effective on newer
CPUs with larger BHB. At boot, dynamically set the loop count of
clear_bhb_loop() such that it is effective on newer CPUs too. Use the
X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 arch/x86/entry/entry_64.S            |  8 +++++---
 arch/x86/include/asm/nospec-branch.h |  2 ++
 arch/x86/kernel/cpu/bugs.c           | 13 +++++++++++++
 3 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 3a180a36ca0e..bbd4b1c7ec04 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
        ANNOTATE_NOENDBR
        push    %rbp
        mov     %rsp, %rbp
-       movl    $5, %ecx
+
+       movzbl    bhb_seq_outer_loop(%rip), %ecx
+
        ANNOTATE_INTRA_FUNCTION_CALL
        call    1f
        jmp     5f
@@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
         * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
         * but some Clang versions (e.g. 18) don't like this.
         */
-       .skip 32 - 18, 0xcc
-2:     movl    $5, %eax
+       .skip 32 - 20, 0xcc
+2:     movzbl  bhb_seq_inner_loop(%rip), %eax
 3:     jmp     4f
        nop
 4:     sub     $1, %eax
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 70b377fcbc1c..87b83ae7c97f 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
 extern void update_spec_ctrl_cond(u64 val);
 extern u64 spec_ctrl_current(void);

+extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
+
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 83f51cab0b1e..2cb4a96247d8 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -2047,6 +2047,10 @@ enum bhi_mitigations {
 static enum bhi_mitigations bhi_mitigation __ro_after_init =
        IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;

+/* Default to short BHB sequence values */
+u8 bhb_seq_outer_loop __ro_after_init = 5;
+u8 bhb_seq_inner_loop __ro_after_init = 5;
+
 static int __init spectre_bhi_parse_cmdline(char *str)
 {
        if (!str)
@@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
                x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
        }

+       /*
+        * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
+        * support), see Intel's BHI guidance.
+        */
+       if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
+               bhb_seq_outer_loop = 12;
+               bhb_seq_inner_loop = 7;
+       }
+
How does this work for VMs in a heterogeneous migration pool that
spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
it isn't available on all hosts in the migration pool, but they need
the long sequence when running on Alder Lake or newer.
As we discussed elsewhere, support for migration pool is much more
involved. It should be dealt in a separate QEMU/KVM focused series.

A quickfix could be adding support for spectre_bhi=long that guests in a
migration pool can use?
The simplest solution is to add "|
cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
If that is unacceptable for the performance of pre-Alder Lake
Yes, that would be unnecessary overhead.
quoted
migration pools, you could define a CPUID or MSR bit that says
explicitly, "long BHB flush sequence needed," rather than trying to
intuit that property from the presence of BHI_CTRL. Like
IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
by a hypervisor.
I will think about this more.
quoted
I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
friends, unless there is a major guest OS out there that relies on
them.
If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
in the best position to decide whether a guest needs
virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
BHI_DIS_S for the guests that are in migration pool?
That is not possible today, since KVM does not implement Intel's
IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
to the guest after the first non-zero write to the guest's MSR.
Yes, KVM doesn't support it yet. But, adding that support to give more
control to userspace VMM helps this case, and probably many other in
the future.

I will check with Chao if he can prepare the next version of virtual
SPEC_CTRL series (leaving out virtual mitigation MSRs).
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help