Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory

[PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-05-19
[PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Michael Roth <hidden> · 2022-06-23
Re: [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-06-24
[PATCH v6 1/8] mm: Introduce memfile_notifier · Chao Peng <hidden> · 2022-05-19
[PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Vishal Annapurve <hidden> · 2022-05-31
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-01
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Gupta, Pankaj <hidden> · 2022-06-01
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-02
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Sean Christopherson <seanjc@google.com> · 2022-06-14
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-15
[PATCH v6 2/8] mm/shmem: Support memfile_notifier · Chao Peng <hidden> · 2022-05-19
[PATCH v6 5/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-05-19
[PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-06-20
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Kirill A. Shutemov <hidden> · 2022-08-19
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-08-25
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Nikunj A. Dadhania <hidden> · 2022-06-24
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-06-24
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Vishal Annapurve <hidden> · 2022-06-30
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Michael Roth <hidden> · 2022-06-30
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Xiaoyao Li <hidden> · 2022-07-01
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-07-07
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Xiaoyao Li <hidden> · 2022-07-08
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Vishal Annapurve <hidden> · 2022-07-20
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-07-21
[PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Andy Lutomirski <luto@kernel.org> · 2022-05-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-05-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · "Andy Lutomirski" <luto@kernel.org> · 2022-05-22
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-05-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-30
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-10
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-14
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Michael Roth <hidden> · 2022-06-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-24
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Michael Roth <hidden> · 2022-06-24
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-20
[PATCH v6 8/8] memfd_create.2: Describe MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Vishal Annapurve <hidden> · 2022-06-06
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-07
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Marc Orr <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Vishal Annapurve <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-09
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Andy Lutomirski <luto@kernel.org> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Andy Lutomirski <luto@kernel.org> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-15
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-15
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Marc Orr <hidden> · 2022-06-10

From: Marc Orr <hidden>
Date: 2022-06-10 00:11:39
Also in: kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Tue, Jun 7, 2022 at 7:22 PM Chao Peng [off-list ref] wrote:

On Tue, Jun 07, 2022 at 05:55:46PM -0700, Marc Orr wrote:

quoted

On Tue, Jun 7, 2022 at 12:01 AM Chao Peng [off-list ref] wrote:

quoted

On Mon, Jun 06, 2022 at 01:09:50PM -0700, Vishal Annapurve wrote:

quoted

Private memory map/unmap and conversion
---------------------------------------
Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
  - map: default fallocate() with mode=0.
  - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE.
The map/unmap will trigger above memfile_notifier_ops to let KVM map/unmap
secondary MMU page tables.

....

quoted

   QEMU: https://github.com/chao-p/qemu/tree/privmem-v6

An example QEMU command line for TDX test:
-object tdx-guest,id=tdx \
-object memory-backend-memfd-private,id=ram1,size=2G \
-machine q35,kvm-type=tdx,pic=no,kernel_irqchip=split,memory-encryption=tdx,memory-backend=ram1

There should be more discussion around double allocation scenarios
when using the private fd approach. A malicious guest or buggy
userspace VMM can cause physical memory getting allocated for both
shared (memory accessible from host) and private fds backing the guest
memory.
Userspace VMM will need to unback the shared guest memory while
handling the conversion from shared to private in order to prevent
double allocation even with malicious guests or bugs in userspace VMM.

I don't know how malicious guest can cause that. The initial design of
this serie is to put the private/shared memory into two different
address spaces and gives usersapce VMM the flexibility to convert
between the two. It can choose respect the guest conversion request or
not.

For example, the guest could maliciously give a device driver a
private page so that a host-side virtual device will blindly write the
private page.

With this patch series, it's actually even not possible for userspace VMM
to allocate private page by a direct write, it's basically unmapped from
there. If it really wants to, it should so something special, by intention,
that's basically the conversion, which we should allow.

I think Vishal did a better job to explain this scenario in his last
reply than I did.

quoted

It's possible for a usrspace VMM to cause double allocation if it fails
to call the unback operation during the conversion, this may be a bug
or not. Double allocation may not be a wrong thing, even in conception.
At least TDX allows you to use half shared half private in guest, means
both shared/private can be effective. Unbacking the memory is just the
current QEMU implementation choice.

Right. But the idea is that this patch series should accommodate all
of the CVM architectures. Or at least that's what I know was
envisioned last time we discussed this topic for SNP [*].

AFAICS, this series should work for both TDX and SNP, and other CVM
architectures. I don't see where TDX can work but SNP cannot, or I
missed something here?

Agreed. I was just responding to the "At least TDX..." bit. Sorry for
any confusion.

quoted

Regardless, it's important to ensure that the VM respects its memory
budget. For example, within Google, we run VMs inside of containers.
So if we double allocate we're going to OOM. This seems acceptable for
an early version of CVMs. But ultimately, I think we need a more
robust way to ensure that the VM operates within its memory container.
Otherwise, the OOM is going to be hard to diagnose and distinguish
from a real OOM.

Thanks for bringing this up. But in my mind I still think userspace VMM
can do and it's its responsibility to guarantee that, if that is hard
required. By design, userspace VMM is the decision-maker for page
conversion and has all the necessary information to know which page is
shared/private. It also has the necessary knobs to allocate/free the
physical pages for guest memory. Definitely, we should make userspace
VMM more robust.

Vishal and Sean did a better job to articulate the concern in their
most recent replies.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help