[PATCH v5 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests
From: Amit Machhiwal <hidden>
Date: 2026-07-01 05:14:43
Also in:
kvm, linux-doc, lkml
On POWER systems, newer processor generations can operate in compatibility
modes corresponding to earlier generations (e.g., a Power11 system running
in Power10 compatibility mode). In such cases, the effective CPU level
exposed to guests differs from the physical processor generation.
This creates a problem for nested virtualization. When booting a nested KVM
guest (L2) inside a host KVM guest (L1) running in a compatibility mode,
userspace (e.g., QEMU) may derive the CPU model from the raw hardware PVR
and attempt to configure the nested guest accordingly. However, the L1
partition is constrained by the compatibility level negotiated with the
hypervisor (L0), and requests exceeding that level are rejected, leading to
guest boot failures such as:
KVM-NESTEDv2: couldn't set guest wide elements
This series provides a mechanism for userspace to query the effective CPU
compatibility modes supported by the host, so it can select an appropriate
CPU model for nested guests.
To achieve this, the series introduces a new KVM capability and ioctl
(KVM_CAP_PPC_COMPAT_CAPS / KVM_PPC_GET_COMPAT_CAPS) that expose the
compatibility modes supported by the host.
Why a new UAPI?
===============
While cpu-version is available in /proc/device-tree/cpus/<cpu#>/cpu-version
on both L1 booted on PowerNV and PowerVM LPARs, the UAPI approach is
preferable for several reasons:
1. pHYP (L0) capabilities: On PowerVM, we need to rely on capabilities
negotiated with pHYP in KVM, not just device tree properties. The
cpu-version property depicts the current compat mode but doesn't point
to what all compat modes are supported for the nested guest.
2. procfs dependency: Not all systems run with procfs enabled (CONFIG_PROC_FS
is optional). Minimal configurations like buildroot might disable it, but
KVM ioctl works regardless since it accesses kernel data structures
directly.
3. Kernel validation: The kernel validates and normalizes the compatibility
information, ensuring userspace gets validated, consistent data.
4. Abstraction & stability: /proc/device-tree is an implementation detail.
The UAPI provides a stable interface that won't break if the underlying
mechanism changes.
5. Semantic clarity: KVM_PPC_GET_COMPAT_CAPS clearly expresses what
compatibility modes can be used for KVM guests, vs. parsing device tree
which requires understanding the semantic meaning of cpu-version.
The implementation supports both:
- KVM on PowerVM (nested API v2), where compatibility information is
served from the cached nested_capabilities value, originally obtained
via the H_GUEST_GET_CAPABILITIES hypercall at module init.
- KVM on PowerNV (nested API v1), where compatibility is derived from the
device tree ("cpu-version") representing the effective processor
compatibility level.
This allows userspace (e.g., QEMU) to select a CPU model consistent with
the host compatibility mode, avoiding mismatches and enabling successful
nested guest boot.
Note: This series is built on top of patches [1] and [2] which must be
applied first. Patch [1] ensures arch_compat is validated against the host
compatibility mode before this series adds the capability query mechanism.
Patch [2] sets CPU_FTR_P11_PVR for Power11 and later processors, which is
needed for proper CPU feature detection in dt-cpu-ftrs environments.
Changes in v5:
- Moved 'size' to be the first member of struct kvm_ppc_compat_caps;
replaced strict size equality with copy_struct_from_user/to_user for
proper forward and backward ABI compatibility; added
KVM_PPC_COMPAT_CAPS_SIZE_VER0 as a frozen version floor constant and
flags == 0 enforcement to prevent ABI ambiguity (patch 1) - [Vaibhav,
Amit]
- Updated PowerVM implementation to use cached nested_capabilities
instead of a live H_GUEST_GET_CAPABILITIES hcall; added a
WARN_ON_ONCE(!nested_capabilities) sanity check (patch 2) - [Vaibhav,
Amit]
- Converted switch in kvmppc_map_compat_capabilities() to use fallthrough
for cumulative compat mode reporting; added of_node_put() in
for_each_node_by_type() to fix OF node reference leak; check 'rc'
error before assigning capabilities (patch 3) - [Vaibhav, Harsh]
- Updated documentation to reflect extensibility model, added E2BIG
error (patch 4) - [Amit]
Changes in v4:
- Added 'size' field to struct kvm_ppc_compat_caps for forward
compatibility and ABI extensibility
- Implemented size validation in ioctl handler to ensure correct structure
size from userspace
- Introduced KVM-specific capability constants (KVM_PPC_COMPAT_CAP_POWER9/
10/11) instead of exposing hypervisor-internal H_GUEST_CAP_* constants
- Added capability masking using KVM_PPC_COMPAT_BITMASK to ensure only
supported processor modes are exposed
- Enhanced error handling with comprehensive error codes (EINVAL, EFAULT,
ENOTTY) and detailed documentation
- Removed Tested-by tags pending re-testing with v4 changes
- Separated validation patch (patch 1 from v3) and sent independently [1]
Changes in v3:
- Added "Why a new UAPI?" section to cover letter addressing questions
about the need for a new UAPI vs. using existing mechanisms like
/proc/device-tree
- Fixed initialization of 'r' in KVM_PPC_GET_COMPAT_CAPS ioctl handler
from 0 to -ENOTTY for proper error handling when the operation is not
supported
- Added Vaibhav's "Suggested-by" tags
- Have retained Anushree's "Tested-by" tags as no major code changes
- Fixed documentation build warning reported by kernel test robot and
added "Reported-by" and "Closes" tags to patch 5
Changes in v2:
- Squashed patches 2 and 3 from v1 (capability introduction and ioctl
wiring) into a single patch for better logical grouping
- Changed kvm_ppc_compat_caps.flags from __u32 to __u64 for consistency
and future extensibility
- Addressed other review comments
- Improved commit messages with clearer explanations of the changes
Patch summary:
[1/4] Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
[2/4] Implement capability retrieval for KVM on PowerVM (API v2)
[3/4] Add KVM on PowerNV support (API v1)
[4/4] Document the new ioctl
Testing (with QEMU v4 patches and on top of patches [1] and [2]):
KVM APIv1 Testing
=================
On P10 PowerNV machine (L0)
---------------------------
- P10 L1 KVM guest -> works
- P10 nested L2 KVM guest -> works
- P9 compat nested L2 KVM guest -> works
- P9 compat L1 KVM guest -> works
- P9 nested L2 KVM guest -> works
On Powernv11 TCG Guest (L0)
---------------------------
- P11 PowerNV TCG L0 guest -> works
- P11 L1 KVM guest -> works
- P11 L2 KVM guest -> works
- P10 compat L1 KVM guest -> works
- P10 L2 KVM guest -> works
- P9 compat L1 KVM guest -> works
- P9 L2 KVM guest -> works
KVM APIv2 Testing
=================
On P11 PowerVM LPAR (L1)
------------------------
- P11 L2 KVM guest -> works
- P10 compat L2 KVM guest -> works
- P9 compat L2 KVM guest fails to boot as expected
- Without QEMU patches but Linux patches
- P11 L2 KVM guest -> works
- P10 compat L2 KVM guest -> works
- P9 compat L2 KVM guest fails to boot as expected
- Without Linux patches but QEMU patches
- P11 L2 KVM guest -> works
- P10 compat L2 KVM guest -> works
On P11 LPAR in P10 compat (L1)
------------------------------
- P10 (host compat) L2 KVM guest -> works
- Without QEMU patch but Linux patches
- P10 guest fails to boot as expected (error: kvm run failed Invalid argument)
- Without Linux patch but QEMU patches
- P10 guest fails to boot as expected (KVM: unknown exit, hardware reason ffffffffffffffea)
On P10 PowerVM LPAR (L1)
------------------------
- P10 L2 KVM guest -> works
- P9 compat L2 KVM guest fails to boot as expected
TCG pSeries Guest
=================
- P11 (default) pSeries guest boots fine
ABI Extensibility Testing (struct size 32, extra member)
=========================================================
- Newer struct on QEMU, older kernel -> works (kernel returns -E2BIG,
QEMU retries with correct size)
- New struct on Linux kernel, older QEMU -> works (kernel zero-pads
trailing fields, QEMU gets correct data)
With this series, nested guests boot successfully in configurations where
they previously failed due to compatibility mismatches.
Related QEMU series:
====================
A corresponding QEMU v4 series will be sent soon.
Previous QEMU versions:
v3: https://lore.kernel.org/all/20260616113915.25589-1-amachhiw@linux.ibm.com/ (local)
v2: https://lore.kernel.org/all/20260502140021.69712-1-amachhiw@linux.ibm.com/ (local)
v1: https://lore.kernel.org/all/20260430061333.37905-1-amachhiw@linux.ibm.com/ (local)
Previous versions:
==================
v4: https://lore.kernel.org/linuxppc-dev/20260616123314.82721-1-amachhiw@linux.ibm.com/ (local)
v3: https://lore.kernel.org/linuxppc-dev/20260522152744.55251-1-amachhiw@linux.ibm.com/ (local)
v2: https://lore.kernel.org/linuxppc-dev/20260513100755.83195-1-amachhiw@linux.ibm.com/
v1: https://lore.kernel.org/linuxppc-dev/20260430054906.94431-1-amachhiw@linux.ibm.com/ (local)
References:
===========
[1] https://lore.kernel.org/all/20260609053327.61563-1-amachhiw@linux.ibm.com/ (local)
[2] https://lore.kernel.org/all/20260614173437.26352-1-amachhiw@linux.ibm.com/ (local)
Amit Machhiwal (4):
KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM
on PowerVM
KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM
on PowerNV
KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
Documentation/virt/kvm/api.rst | 79 +++++++++++++++++++++++++++++
arch/powerpc/include/asm/kvm_ppc.h | 1 +
arch/powerpc/include/uapi/asm/kvm.h | 18 +++++++
arch/powerpc/kvm/book3s_hv.c | 58 +++++++++++++++++++++
arch/powerpc/kvm/powerpc.c | 71 ++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 4 ++
6 files changed, 231 insertions(+)
base-commit: dc59e4fea9d83f03bad6bddf3fa2e52491777482
prerequisite-patch-id: e328a3183c9e9499436c666c30f3659c44e6f3a2
prerequisite-patch-id: 4662f01d2101cfae8502f04290658deed60eec26
--
2.50.1 (Apple Git-155)