Re: [PATCH v9 7/8] KVM: Handle page fault for private memory

[PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-10-25
[PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Xiaoyao Li <hidden> · 2022-10-28
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-10-31
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Alex Bennée <hidden> · 2022-11-14
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-11-15
[PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-10-31
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vlastimil Babka <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-15
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-03
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · David Hildenbrand <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-30
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-30
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vishal Annapurve <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vishal Annapurve <hidden> · 2022-12-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-12-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A . Shutemov <hidden> · 2022-12-02
[PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Peter Maydell <hidden> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-15
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-17
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-17
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-22
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-23
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · "Andy Lutomirski" <luto@kernel.org> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-17
[PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-04
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-04
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-10
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-10
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-11
[PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-03
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-04
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-04
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Yuan Yao <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Yuan Yao <hidden> · 2022-11-09
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-17
[PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Yuan Yao <hidden> · 2022-11-08
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-11-09
[PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Isaku Yamahata <hidden> · 2022-11-01
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-11-01
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Ackerley Tng <hidden> · 2022-11-16
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-11-17
[PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Vishal Annapurve <hidden> · 2022-11-03
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Isaku Yamahata <hidden> · 2022-11-08
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Kirill A. Shutemov <hidden> · 2022-11-09
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Kirill A. Shutemov <hidden> · 2022-11-15
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Alex Bennée <hidden> · 2022-11-14
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-11-16
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Alex Bennée <hidden> · 2022-11-16
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-11-17

From: Ackerley Tng <hidden>
Date: 2022-11-16 20:50:31
Also in: kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

quoted hunk ↗ jump to hunk

A memslot with KVM_MEM_PRIVATE being set can include both fd-based
private memory and hva-based shared memory. Architecture code (like TDX
code) can tell whether the on-going fault is private or not. This patch
adds a 'is_private' field to kvm_page_fault to indicate this and
architecture code is expected to set it.

To handle page fault for such memslot, the handling logic is different
depending on whether the fault is private or shared. KVM checks if
'is_private' matches the host's view of the page (maintained in
mem_attr_array).
  - For a successful match, private pfn is obtained with
    restrictedmem_get_page () from private fd and shared pfn is obtained
    with existing get_user_pages().
  - For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
    userspace. Userspace then can convert memory between private/shared
    in host's view and retry the fault.

Co-developed-by: Yu Zhang <redacted>
Signed-off-by: Yu Zhang <redacted>
Signed-off-by: Chao Peng <redacted>
---
 arch/x86/kvm/mmu/mmu.c          | 56 +++++++++++++++++++++++++++++++--
 arch/x86/kvm/mmu/mmu_internal.h | 14 ++++++++-
 arch/x86/kvm/mmu/mmutrace.h     |  1 +
 arch/x86/kvm/mmu/spte.h         |  6 ++++
 arch/x86/kvm/mmu/tdp_mmu.c      |  3 +-
 include/linux/kvm_host.h        | 28 +++++++++++++++++
 6 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 67a9823a8c35..10017a9f26ee 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c

@@ -3030,7 +3030,7 @@ static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn,

 int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			      const struct kvm_memory_slot *slot, gfn_t gfn,
-			      int max_level)
+			      int max_level, bool is_private)
 {
 	struct kvm_lpage_info *linfo;
 	int host_level;

@@ -3042,6 +3042,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
 			break;
 	}

+	if (is_private)
+		return max_level;
+
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;

@@ -3070,7 +3073,8 @@ void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	 * level, which will be used to do precise, accurate accounting.
 	 */
 	fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot,
-						     fault->gfn, fault->max_level);
+						     fault->gfn, fault->max_level,
+						     fault->is_private);
 	if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed)
 		return;

@@ -4141,6 +4145,32 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work)
 	kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true);
 }

+static inline u8 order_to_level(int order)
+{
+	BUILD_BUG_ON(KVM_MAX_HUGEPAGE_LEVEL > PG_LEVEL_1G);
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_1G))
+		return PG_LEVEL_1G;
+
+	if (order >= KVM_HPAGE_GFN_SHIFT(PG_LEVEL_2M))
+		return PG_LEVEL_2M;
+
+	return PG_LEVEL_4K;
+}
+
+static int kvm_faultin_pfn_private(struct kvm_page_fault *fault)
 +{
 +	int order;
 +	struct kvm_memory_slot *slot = fault->slot;
 +
 +	if (kvm_restricted_mem_get_pfn(slot, fault->gfn, &fault->pfn, &order))
+		return RET_PF_RETRY;
+
+	fault->max_level = min(order_to_level(order), fault->max_level);
+	fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
+	return RET_PF_CONTINUE;
+}
+

static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
{
	struct kvm_memory_slot *slot = fault->slot;

@@ -4173,6 +4203,22 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)

			return RET_PF_EMULATE;
	}

+	if (kvm_slot_can_be_private(slot) &&
+	    fault->is_private != kvm_mem_is_private(vcpu->kvm, fault->gfn)) {
+		vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
+		if (fault->is_private)
+			vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
+		else
+			vcpu->run->memory.flags = 0;
+		vcpu->run->memory.padding = 0;
+		vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+		vcpu->run->memory.size = PAGE_SIZE;
+		return RET_PF_USER;
+	}
+
+	if (fault->is_private)
+		return kvm_faultin_pfn_private(fault);
+

Since each memslot may also not be backed by restricted memory, we
should also check if the memslot has been set up for private memory
with

	if (fault->is_private && kvm_slot_can_be_private(slot))
		return kvm_faultin_pfn_private(fault);

Without this check, restrictedmem_get_page will get called with NULL
in slot->restricted_file, which causes a NULL pointer dereference.

quoted hunk ↗ jump to hunk

	async = false;
	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
					  fault->write, &fault->map_writable,

@@ -5557,6 +5603,9 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err

			return -EIO;
	}

+	if (r == RET_PF_USER)
+		return 0;
+
	if (r < 0)
		return r;
	if (r != RET_PF_EMULATE)

@@ -6408,7 +6457,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,

		 */
		if (sp->role.direct &&
		    sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, sp->gfn,
-							       PG_LEVEL_NUM)) {
+							       PG_LEVEL_NUM,
+							       false)) {
			kvm_zap_one_rmap_spte(kvm, rmap_head, sptep);

			if (kvm_available_flush_tlb_with_range())

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help