Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE
From: Chao Peng <hidden>
Date: 2023-03-28 10:49:22
Also in:
kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
Possibly related (same subject, not in this thread)
- 2023-04-25 · Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE · Sean Christopherson <seanjc@google.com>
- 2023-04-18 · Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE · Ackerley Tng <hidden>
- 2023-03-07 · Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE · Sean Christopherson <seanjc@google.com>
- 2023-03-07 · Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE · Ackerley Tng <hidden>
- 2023-01-28 · Re: [PATCH v10 9/9] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden>
On Fri, Mar 24, 2023 at 10:29:25AM +0800, Xiaoyao Li wrote:
On 3/24/2023 10:10 AM, Chao Peng wrote:quoted
On Wed, Mar 22, 2023 at 05:41:31PM -0700, Isaku Yamahata wrote:quoted
On Wed, Mar 08, 2023 at 03:40:26PM +0800, Chao Peng [off-list ref] wrote:quoted
On Wed, Mar 08, 2023 at 12:13:24AM +0000, Ackerley Tng wrote:quoted
Chao Peng [off-list ref] writes:quoted
On Sat, Jan 14, 2023 at 12:01:01AM +0000, Sean Christopherson wrote:quoted
On Fri, Dec 02, 2022, Chao Peng wrote:...quoted
Strongly prefer to use similar logic to existing code that detects wraps:quoted
quoted
mem->restricted_offset + mem->memory_size < mem->restricted_offsetquoted
quoted
This is also where I'd like to add the "gfn is aligned to offset" check, though my brain is too fried to figure that out right now.quoted
Used count_trailing_zeros() for this TODO, unsure we have other better approach.quoted
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index afc8c26fa652..fd34c5f7cd2f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c@@ -56,6 +56,7 @@ #include <asm/processor.h> #include <asm/ioctl.h> #include <linux/uaccess.h> +#include <linux/count_zeros.h>quoted
#include "coalesced_mmio.h" #include "async_pf.h"@@ -2087,6 +2088,19 @@ static bool kvm_check_memslot_overlap(structkvm_memslots *slots, int id, return false; }quoted
+/* + * Return true when ALIGNMENT(offset) >= ALIGNMENT(gpa). + */ +static bool kvm_check_rmem_offset_alignment(u64 offset, u64 gpa) +{ + if (!offset) + return true; + if (!gpa) + return false; + + return !!(count_trailing_zeros(offset) >= count_trailing_zeros(gpa));This check doesn't work expected. For example, offset = 2GB, gpa=4GB this check fails.This case is expected to fail as Sean initially suggested[*]: I would rather reject memslot if the gfn has lesser alignment than the offset. I'm totally ok with this approach _if_ there's a use case. Until such a use case presents itself, I would rather be conservative from a uAPI perspective. I understand that we put tighter restriction on this but if you see such restriction is really a big issue for real usage, instead of a theoretical problem, then we can loosen the check here. But at that time below code is kind of x86 specific and may need improve. BTW, in latest code, I replaced count_trailing_zeros() with fls64(): return !!(fls64(offset) >= fls64(gpa));wouldn't it be !!(ffs64(offset) <= ffs64(gpa)) ?
As the function document explains, here we want to return true when ALIGNMENT(offset) >= ALIGNMENT(gpa), so '>=' is what we need. It's worthy clarifying that in Sean's original suggestion he actually mentioned the opposite. He said 'reject memslot if the gfn has lesser alignment than the offset', but I wonder this is his purpose, since if ALIGNMENT(offset) < ALIGNMENT(gpa), we wouldn't be possible to map the page as largepage. Consider we have below config: gpa=2M, offset=1M In this case KVM tries to map gpa at 2M as 2M hugepage but the physical page at the offset(1M) in private_fd cannot provide the 2M page due to misalignment. But as we discussed in the off-list thread, here we do find a real use case indicating this check is too strict. i.e. QEMU immediately fails when launch a guest > 2G memory. For this case QEMU splits guest memory space into two slots: Slot#1(ram_below_4G): gpa=0x0, offset=0x0, size=2G Slot#2(ram_above_4G): gpa=4G, offset=2G, size=totalsize-2G This strict alignment check fails for slot#2 because offset(2G) has less alignment than gpa(4G). To allow this, one solution can revert to my previous change in kvm_alloc_memslot_metadata() to disallow hugepage only when the offset/gpa are not aligned to related page size. Sean, How do you think? Chao
quoted
[*] https://lore.kernel.org/all/Y8HldeHBrw+OOZVm@google.com/ (local) Chaoquoted
I come up with the following.quoted
From ec87e25082f0497431b732702fae82c6a05071bf Mon Sep 17 00:00:00 2001Message-Id: [off-list ref] From: Isaku Yamahata <redacted> Date: Wed, 22 Mar 2023 15:32:56 -0700 Subject: [PATCH] KVM: Relax alignment check for restricted mem kvm_check_rmem_offset_alignment() only checks based on offset alignment and GPA alignment. However, the actual alignment for offset depends on architecture. For x86 case, it can be 1G, 2M or 4K. So even if GPA is aligned for 1G+, only 1G-alignment is required for offset. Without this patch, gpa=4G, offset=2G results in failure of memory slot creation. Fixes: edc8814b2c77 ("KVM: Require gfn be aligned with restricted offset") Signed-off-by: Isaku Yamahata <redacted> --- arch/x86/include/asm/kvm_host.h | 15 +++++++++++++++ virt/kvm/kvm_main.c | 9 ++++++++- 2 files changed, 23 insertions(+), 1 deletion(-)diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 88e11dd3afde..03af44650f24 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h@@ -16,6 +16,7 @@ #include <linux/irq_work.h> #include <linux/irq.h> #include <linux/workqueue.h> +#include <linux/count_zeros.h> #include <linux/kvm.h> #include <linux/kvm_para.h>@@ -143,6 +144,20 @@ #define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1)) #define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE) +#define kvm_arch_required_alignment kvm_arch_required_alignment +static inline int kvm_arch_required_alignment(u64 gpa) +{ + int zeros = count_trailing_zeros(gpa); + + WARN_ON_ONCE(!PAGE_ALIGNED(gpa)); + if (zeros >= KVM_HPAGE_SHIFT(PG_LEVEL_1G)) + return KVM_HPAGE_SHIFT(PG_LEVEL_1G); + else if (zeros >= KVM_HPAGE_SHIFT(PG_LEVEL_2M)) + return KVM_HPAGE_SHIFT(PG_LEVEL_2M); + + return PAGE_SHIFT; +} + #define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50 #define KVM_MIN_ALLOC_MMU_PAGES 64UL #define KVM_MMU_HASH_SHIFT 12diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index c9c4eef457b0..f4ff96171d24 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c@@ -2113,6 +2113,13 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id, return false; } +#ifndef kvm_arch_required_alignment +__weak int kvm_arch_required_alignment(u64 gpa) +{ + return PAGE_SHIFT +} +#endif + /* * Return true when ALIGNMENT(offset) >= ALIGNMENT(gpa). */@@ -2123,7 +2130,7 @@ static bool kvm_check_rmem_offset_alignment(u64 offset, u64 gpa) if (!gpa) return false; - return !!(count_trailing_zeros(offset) >= count_trailing_zeros(gpa)); + return !!(count_trailing_zeros(offset) >= kvm_arch_required_alignment(gpa)); } /*-- 2.25.1 -- Isaku Yamahata <isaku.yamahata@gmail.com>