Re: [PATCH v8 21/28] KVM: arm64: Add tracing capability for the pKVM hyp
From: Vincent Donnefort <hidden>
Date: 2025-11-20 12:02:00
Also in:
kvmarm, linux-arm-kernel, lkml
Subsystem:
arm64 port (aarch64 architecture), kernel virtual machine for arm64 (kvm/arm64), the rest · Maintainers:
Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton, Linus Torvalds
On Wed, Nov 19, 2025 at 05:06:38PM +0000, Marc Zyngier wrote:
On Fri, 07 Nov 2025 09:38:33 +0000, Vincent Donnefort [off-list ref] wrote:quoted
When running with protected mode, the host has very little knowledge about what is happening in the hypervisor. Of course this is an essential feature for security but nonetheless, that piece of code growing with more responsibilities, we need now a way to debug and profile it. Tracefs by its reliability, versatility and support for user-space is the perfect tool. There's no way the hypervisor could log events directly into the host tracefs ring-buffers. So instead let's use our own, where the hypervisor is the writer and the host the reader. Signed-off-by: Vincent Donnefort <redacted>diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h index 9da54d4ee49e..ad02dee140d3 100644 --- a/arch/arm64/include/asm/kvm_asm.h +++ b/arch/arm64/include/asm/kvm_asm.h@@ -89,6 +89,10 @@ enum __kvm_host_smccc_func { __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_load, __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_put, __KVM_HOST_SMCCC_FUNC___pkvm_tlb_flush_vmid, + __KVM_HOST_SMCCC_FUNC___pkvm_load_tracing, + __KVM_HOST_SMCCC_FUNC___pkvm_unload_tracing, + __KVM_HOST_SMCCC_FUNC___pkvm_enable_tracing, + __KVM_HOST_SMCCC_FUNC___pkvm_swap_reader_tracing, }; #define DECLARE_KVM_VHE_SYM(sym) extern char sym[]diff --git a/arch/arm64/include/asm/kvm_hyptrace.h b/arch/arm64/include/asm/kvm_hyptrace.h new file mode 100644 index 000000000000..9c30a479bc36 --- /dev/null +++ b/arch/arm64/include/asm/kvm_hyptrace.h@@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ARM64_KVM_HYPTRACE_H_ +#define __ARM64_KVM_HYPTRACE_H_ + +#include <linux/ring_buffer.h> + +struct hyp_trace_desc { + unsigned long bpages_backing_start;Why is this an integer type? You keep casting it all over the place, which tells me that's not the ideal type.
That's because it is a kern VA the hyp needs to convert. However it would indeed make my life easier to declare it as a struct simple_buffer_page * in the struct hyp_trace_buffer below.
quoted
+ size_t bpages_backing_size; + struct trace_buffer_desc trace_buffer_desc; + +}; +#endifdiff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 4f803fd1c99a..580426cdbe77 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig@@ -83,4 +83,11 @@ config PTDUMP_STAGE2_DEBUGFS If in doubt, say N. +config PKVM_TRACING + bool + depends on KVM + depends on TRACING + select SIMPLE_RING_BUFFER + default yI'd rather this is made to depend on NVHE_EL2_DEBUG, just like the other debug options.
NVHE_EL2_DEBUG is unsafe for production because of the stage-2 relax on panic. While this one is. So ideally this should be usable even without NVHE_EL2_DEBUG. I can remove this hidden PKVM_TRACING option and use everywhere CONFIG_TRACING. But then I need something to select SIMPLE_RING_BUFFER. Perhaps with the following?
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 2ae6bf499236..c561bf9d4754 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig@@ -38,6 +38,7 @@ menuconfig KVM select SCHED_INFO select GUEST_PERF_EVENTS if PERF_EVENTS select KVM_GUEST_MEMFD + select SIMPLE_RING_BUFFER if CONFIG_TRACING
quoted
+ endif # VIRTUALIZATIONdiff --git a/arch/arm64/kvm/hyp/include/nvhe/trace.h b/arch/arm64/kvm/hyp/include/nvhe/trace.h new file mode 100644 index 000000000000..996e90c0974f --- /dev/null +++ b/arch/arm64/kvm/hyp/include/nvhe/trace.h@@ -0,0 +1,23 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ARM64_KVM_HYP_NVHE_TRACE_H +#define __ARM64_KVM_HYP_NVHE_TRACE_H +#include <asm/kvm_hyptrace.h> + +#ifdef CONFIG_PKVM_TRACING +void *tracing_reserve_entry(unsigned long length); +void tracing_commit_entry(void); + +int __pkvm_load_tracing(unsigned long desc_va, size_t desc_size); +void __pkvm_unload_tracing(void); +int __pkvm_enable_tracing(bool enable); +int __pkvm_swap_reader_tracing(unsigned int cpu); +#else +static inline void *tracing_reserve_entry(unsigned long length) { return NULL; } +static inline void tracing_commit_entry(void) { } + +static inline int __pkvm_load_tracing(unsigned long desc_va, size_t desc_size) { return -ENODEV; } +static inline void __pkvm_unload_tracing(void) { } +static inline int __pkvm_enable_tracing(bool enable) { return -ENODEV; } +static inline int __pkvm_swap_reader_tracing(unsigned int cpu) { return -ENODEV; } +#endif +#endifdiff --git a/arch/arm64/kvm/hyp/nvhe/Makefile b/arch/arm64/kvm/hyp/nvhe/Makefile index f55a9a17d38f..504c3b9caef8 100644 --- a/arch/arm64/kvm/hyp/nvhe/Makefile +++ b/arch/arm64/kvm/hyp/nvhe/Makefile@@ -29,7 +29,7 @@ hyp-obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o hyp-obj-y += ../../../kernel/smccc-call.o hyp-obj-$(CONFIG_LIST_HARDENED) += list_debug.o -hyp-obj-$(CONFIG_PKVM_TRACING) += clock.o +hyp-obj-$(CONFIG_PKVM_TRACING) += clock.o trace.o ../../../../../kernel/trace/simple_ring_buffer.oCan we get something less awful here? Surely there is a way to get an absolute path from the kbuild infrastructure? $(objtree) springs to mind...
Ack. [...]
quoted
+int __pkvm_load_tracing(unsigned long desc_hva, size_t desc_size) +{ + struct hyp_trace_desc *desc = (struct hyp_trace_desc *)kern_hyp_va(desc_hva); + int ret; + + if (!desc_size || !PAGE_ALIGNED(desc_hva) || !PAGE_ALIGNED(desc_size)) + return -EINVAL; + + ret = __pkvm_host_donate_hyp(hyp_virt_to_pfn((void *)desc), + desc_size >> PAGE_SHIFT); + if (ret) + return ret; + + if (!hyp_trace_desc_validate(desc, desc_size)) + goto err_donate_desc; + + hyp_spin_lock(&trace_buffer.lock); + + ret = hyp_trace_buffer_load(&trace_buffer, desc); + + hyp_spin_unlock(&trace_buffer.lock); + +err_donate_desc: + WARN_ON(__pkvm_hyp_donate_host(hyp_virt_to_pfn((void *)desc), + desc_size >> PAGE_SHIFT));That's basically a guaranteed panic if anything goes wrong. Are you sure you want to do that?
A failure would mean a lost page for the kernel. As there's really no reason for this to happen (the host_donate_hyp worked few lines above), it sounds alright to panic here in this case. In reclaim_pgtable_pages() applies the same reasoning: if hyp_donate_host fails, something really wrong happened.
quoted
+ return ret; +} + +void __pkvm_unload_tracing(void) +{ + hyp_spin_lock(&trace_buffer.lock); + hyp_trace_buffer_unload(&trace_buffer); + hyp_spin_unlock(&trace_buffer.lock); +} + +int __pkvm_enable_tracing(bool enable) +{ + int cpu, ret = enable ? -EINVAL : 0; + + hyp_spin_lock(&trace_buffer.lock); + + if (!hyp_trace_buffer_loaded(&trace_buffer)) + goto unlock; + + for (cpu = 0; cpu < hyp_nr_cpus; cpu++) + simple_ring_buffer_enable_tracing(per_cpu_ptr(trace_buffer.simple_rbs, cpu), + enable); + + ret = 0; + +unlock: + hyp_spin_unlock(&trace_buffer.lock); + + return ret; +} + +int __pkvm_swap_reader_tracing(unsigned int cpu) +{ + int ret; + + if (cpu >= hyp_nr_cpus) + return -EINVAL; + + hyp_spin_lock(&trace_buffer.lock); + + if (hyp_trace_buffer_loaded(&trace_buffer)) + ret = simple_ring_buffer_swap_reader_page( + per_cpu_ptr(trace_buffer.simple_rbs, cpu));Please keep these things on a single line. I don't care what people (of checkpatch) say.
Ack.
quoted
+ else + ret = -ENODEV; + + hyp_spin_unlock(&trace_buffer.lock); + + return ret; +} -- 2.51.2.1041.gc1ab5b90ca-googThanks, M. -- Without deviation from the norm, progress is not possible.