Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create... | linux-doc

[PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-10-25
[PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Xiaoyao Li <hidden> · 2022-10-28
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-10-31
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Alex Bennée <hidden> · 2022-11-14
Re: [PATCH v9 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-11-15
[PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-10-31
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-01
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vlastimil Babka <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-15
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-14
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-03
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A. Shutemov <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · David Hildenbrand <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-11-30
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Michael Roth <hidden> · 2022-11-30
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vishal Annapurve <hidden> · 2022-11-29
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Vishal Annapurve <hidden> · 2022-12-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Chao Peng <hidden> · 2022-12-02
Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory · Kirill A . Shutemov <hidden> · 2022-12-02
[PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Peter Maydell <hidden> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-10-25
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-15
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-17
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-17
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Alex Bennée <hidden> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-18
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-22
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-23
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · "Andy Lutomirski" <luto@kernel.org> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-11-17
[PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-04
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-04
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-10
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Sean Christopherson <seanjc@google.com> · 2022-11-10
Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-11-11
[PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-03
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-04
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-04
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Yuan Yao <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-08
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Yuan Yao <hidden> · 2022-11-09
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-11-17
[PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Yuan Yao <hidden> · 2022-11-08
Re: [PATCH v9 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-11-09
[PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Isaku Yamahata <hidden> · 2022-10-26
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-10-28
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Isaku Yamahata <hidden> · 2022-11-01
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-11-01
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Ackerley Tng <hidden> · 2022-11-16
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-11-16
Re: [PATCH v9 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-11-17
[PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-10-25
Re: [PATCH v9 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Fuad Tabba <hidden> · 2022-10-27
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Vishal Annapurve <hidden> · 2022-11-03
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Isaku Yamahata <hidden> · 2022-11-08
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Kirill A. Shutemov <hidden> · 2022-11-09
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Kirill A. Shutemov <hidden> · 2022-11-15
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Alex Bennée <hidden> · 2022-11-14
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-11-16
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Alex Bennée <hidden> · 2022-11-16
Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-11-17

Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory

From: Michael Roth <hidden>
Date: 2022-11-29 19:07:22
Also in: kvm, linux-api, linux-arch, linux-fsdevel, linux-mm, lkml, qemu-devel

On Tue, Nov 29, 2022 at 10:06:15PM +0800, Chao Peng wrote:

On Mon, Nov 28, 2022 at 06:37:25PM -0600, Michael Roth wrote:

quoted

On Tue, Oct 25, 2022 at 11:13:37PM +0800, Chao Peng wrote:

...

quoted

+static long restrictedmem_fallocate(struct file *file, int mode,
+				    loff_t offset, loff_t len)
+{
+	struct restrictedmem_data *data = file->f_mapping->private_data;
+	struct file *memfd = data->memfd;
+	int ret;
+
+	if (mode & FALLOC_FL_PUNCH_HOLE) {
+		if (!PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len))
+			return -EINVAL;
+	}
+
+	restrictedmem_notifier_invalidate(data, offset, offset + len, true);

The KVM restrictedmem ops seem to expect pgoff_t, but here we pass
loff_t. For SNP we've made this strange as part of the following patch
and it seems to produce the expected behavior:

That's correct. Thanks.

quoted

  https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmdroth%2Flinux%2Fcommit%2Fd669c7d3003ff7a7a47e73e8c3b4eeadbd2c4eb6&amp;data=05%7C01%7Cmichael.roth%40amd.com%7C99e80696067a40d42f6e08dad2138556%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638053278531323330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=WDj4KxJjhcntBWJUGCjNmMPfZMGQkCSaAo6ElYrGgF0%3D&amp;reserved=0

quoted

+	ret = memfd->f_op->fallocate(memfd, mode, offset, len);
+	restrictedmem_notifier_invalidate(data, offset, offset + len, false);
+	return ret;
+}
+

<snip>

quoted

+int restrictedmem_get_page(struct file *file, pgoff_t offset,
+			   struct page **pagep, int *order)
+{
+	struct restrictedmem_data *data = file->f_mapping->private_data;
+	struct file *memfd = data->memfd;
+	struct page *page;
+	int ret;
+
+	ret = shmem_getpage(file_inode(memfd), offset, &page, SGP_WRITE);

This will result in KVM allocating pages that userspace hasn't necessary
fallocate()'d. In the case of SNP we need to get the PFN so we can clean
up the RMP entries when restrictedmem invalidations are issued for a GFN
range.

Yes fallocate() is unnecessary unless someone wants to reserve some
space (e.g. for determination or performance purpose), this matches its
semantics perfectly at:
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.man7.org%2Flinux%2Fman-pages%2Fman2%2Ffallocate.2.html&amp;data=05%7C01%7Cmichael.roth%40amd.com%7C99e80696067a40d42f6e08dad2138556%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638053278531323330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=67sdTY47cM1IBUG2eJCltYF5SyGOpd9%2FVxVlHUw02tU%3D&amp;reserved=0

quoted

If the guest supports lazy-acceptance however, these pages may not have
been faulted in yet, and if the VMM defers actually fallocate()'ing space
until the guest actually tries to issue a shared->private for that GFN
(to support lazy-pinning), then there may never be a need to allocate
pages for these backends.

However, the restrictedmem invalidations are for GFN ranges so there's
no way to know inadvance whether it's been allocated yet or not. The
xarray is one option but currently it defaults to 'private' so that
doesn't help us here. It might if we introduced a 'uninitialized' state
or something along that line instead of just the binary
'shared'/'private' though...

How about if we change the default to 'shared' as we discussed at
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fall%2FY35gI0L8GMt9%2BOkK%40google.com%2F&amp;data=05%7C01%7Cmichael.roth%40amd.com%7C99e80696067a40d42f6e08dad2138556%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638053278531323330%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=qzWObDo7ZHW4YjuAjZ5%2B1wEwbqymgBiNM%2BYXiyUSBdI%3D&amp;reserved=0?

Need to look at this a bit more, but I think that could work as well.

quoted

But for now we added a restrictedmem_get_page_noalloc() that uses
SGP_NONE instead of SGP_WRITE to avoid accidentally allocating a bunch
of memory as part of guest shutdown, and a
kvm_restrictedmem_get_pfn_noalloc() variant to go along with that. But
maybe a boolean param is better? Or maybe SGP_NOALLOC is the better
default, and we just propagate an error to userspace if they didn't
fallocate() in advance?

This (making fallocate() a hard requirement) not only complicates the
userspace but also forces the lazy-faulting going through a long path of
exiting to userspace. Unless we don't have other options I would not go
this way.

Unless I'm missing something, it's already the case that userspace is
responsible for handling all the shared->private transitions in response
to KVM_EXIT_MEMORY_FAULT or (in our case) KVM_EXIT_VMGEXIT. So it only
places the additional requirements on the VMM that if they *don't*
preallocate, then they'll need to issue the fallocate() prior to issuing
the KVM_MEM_ENCRYPT_REG_REGION ioctl in response to these events.

QEMU for example already has a separate 'prealloc' option for cases
where they want to prefault all the guest memory, so it makes sense to
continue making that an optional thing with regard to UPM.

-Mike

Chao

quoted

-Mike

quoted

+	if (ret)
+		return ret;
+
+	*pagep = page;
+	if (order)
+		*order = thp_order(compound_head(page));
+
+	SetPageUptodate(page);
+	unlock_page(page);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(restrictedmem_get_page);
-- 
2.25.1

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help