Re: [PATCH 14/30] KVM: arm64: Hook up reclaim hypercall to pkvm_pgtable_stage2_destroy()
From: Quentin Perret <hidden>
Date: 2026-01-09 14:58:03
Also in:
kvmarm
On Friday 09 Jan 2026 at 14:35:57 (+0000), Will Deacon wrote:
On Tue, Jan 06, 2026 at 02:59:19PM +0000, Quentin Perret wrote:quoted
On Monday 05 Jan 2026 at 15:49:22 (+0000), Will Deacon wrote:quoted
During teardown of a protected guest, its memory pages must be reclaimed from the hypervisor by issuing the '__pkvm_reclaim_dying_guest_page' hypercall. Add a new helper, __pkvm_pgtable_stage2_reclaim(), which is called during the VM teardown operation to reclaim pages from the hypervisor and drop the GUP pin on the host. Signed-off-by: Will Deacon <will@kernel.org> --- arch/arm64/kvm/pkvm.c | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-)diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c index 1814e17d600e..8be91051699e 100644 --- a/arch/arm64/kvm/pkvm.c +++ b/arch/arm64/kvm/pkvm.c@@ -322,6 +322,32 @@ int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *mmu, return 0; } +static int __pkvm_pgtable_stage2_reclaim(struct kvm_pgtable *pgt, u64 start, u64 end) +{ + struct kvm *kvm = kvm_s2_mmu_to_kvm(pgt->mmu); + pkvm_handle_t handle = kvm->arch.pkvm.handle; + struct pkvm_mapping *mapping; + int ret; + + for_each_mapping_in_range_safe(pgt, start, end, mapping) { + struct page *page; + + ret = kvm_call_hyp_nvhe(__pkvm_reclaim_dying_guest_page, + handle, mapping->gfn); + if (WARN_ON(ret)) + return ret; + + page = pfn_to_page(mapping->pfn); + WARN_ON_ONCE(mapping->nr_pages != 1); + unpin_user_pages_dirty_lock(&page, 1, true); + account_locked_vm(current->mm, 1, false); + pkvm_mapping_remove(mapping, &pgt->pkvm_mappings); + kfree(mapping);Nit: this might take a while, worth adding a cond_resched() in here?Commit 4ddfab5436b6 ("KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables") breaks the destruction operation up into chunks so we now reschedule from the caller in stage2_destroy_range(). Do you think we need an additional reschedule?
I would intuitively think so, but I've also just realized we already don't have that resched for np guests, so this patch is absolutely fine as-is. It's just that knocking down a whole 1G (assuming 4K pages) will be significantly faster for normal KVM than for pKVM. It's one call in pgtable.c with possibly a single deferred TLBI vs 1 hcall and 1 TLBI per 4K page. But I guess this is really performance tuning, so this can always come later. Feel free to ignore :)