Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory

[PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-05-19
[PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Michael Roth <hidden> · 2022-06-23
Re: [PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-06-24
[PATCH v6 1/8] mm: Introduce memfile_notifier · Chao Peng <hidden> · 2022-05-19
[PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Vishal Annapurve <hidden> · 2022-05-31
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-01
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Gupta, Pankaj <hidden> · 2022-06-01
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-02
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Sean Christopherson <seanjc@google.com> · 2022-06-14
Re: [PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-06-15
[PATCH v6 2/8] mm/shmem: Support memfile_notifier · Chao Peng <hidden> · 2022-05-19
[PATCH v6 5/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-05-19
[PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-06-20
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Kirill A. Shutemov <hidden> · 2022-08-19
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-08-25
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Nikunj A. Dadhania <hidden> · 2022-06-24
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-06-24
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Vishal Annapurve <hidden> · 2022-06-30
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Michael Roth <hidden> · 2022-06-30
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Xiaoyao Li <hidden> · 2022-07-01
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-07-07
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Xiaoyao Li <hidden> · 2022-07-08
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Vishal Annapurve <hidden> · 2022-07-20
Re: [PATCH v6 6/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-07-21
[PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Andy Lutomirski <luto@kernel.org> · 2022-05-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-05-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · "Andy Lutomirski" <luto@kernel.org> · 2022-05-22
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-05-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-05-30
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-10
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-14
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Michael Roth <hidden> · 2022-06-23
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-24
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Michael Roth <hidden> · 2022-06-24
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-06-17
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-20
Re: [PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-06-20
[PATCH v6 8/8] memfd_create.2: Describe MFD_INACCESSIBLE flag · Chao Peng <hidden> · 2022-05-19
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Vishal Annapurve <hidden> · 2022-06-06
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-07
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Marc Orr <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Vishal Annapurve <hidden> · 2022-06-08
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-09
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Andy Lutomirski <luto@kernel.org> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Andy Lutomirski <luto@kernel.org> · 2022-06-14
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Chao Peng <hidden> · 2022-06-15
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Sean Christopherson <seanjc@google.com> · 2022-06-15
Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory · Marc Orr <hidden> · 2022-06-10

From: Andy Lutomirski <luto@kernel.org>
Date: 2022-06-14 21:00:03
Also in: kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Tue, Jun 14, 2022 at 12:09 PM Sean Christopherson [off-list ref] wrote:

On Tue, Jun 14, 2022, Andy Lutomirski wrote:

quoted

On Tue, Jun 14, 2022 at 12:32 AM Chao Peng [off-list ref] wrote:

quoted

On Thu, Jun 09, 2022 at 08:29:06PM +0000, Sean Christopherson wrote:

quoted

On Wed, Jun 08, 2022, Vishal Annapurve wrote:

One argument is that userspace can simply rely on cgroups to detect misbehaving
guests, but (a) those types of OOMs will be a nightmare to debug and (b) an OOM
kill from the host is typically considered a _host_ issue and will be treated as
a missed SLO.

An idea for handling this in the kernel without too much complexity would be to
add F_SEAL_FAULT_ALLOCATIONS (terrible name) that would prevent page faults from
allocating pages, i.e. holes can only be filled by an explicit fallocate().  Minor
faults, e.g. due to NUMA balancing stupidity, and major faults due to swap would
still work, but writes to previously unreserved/unallocated memory would get a
SIGSEGV on something it has mapped.  That would allow the userspace VMM to prevent
unintentional allocations without having to coordinate unmapping/remapping across
multiple processes.

Since this is mainly for shared memory and the motivation is catching
misbehaved access, can we use mprotect(PROT_NONE) for this? We can mark
those range backed by private fd as PROT_NONE during the conversion so
subsequence misbehaved accesses will be blocked instead of causing double
allocation silently.

PROT_NONE, a.k.a. mprotect(), has the same vma downsides as munmap().

quoted

This patch series is fairly close to implementing a rather more
efficient solution.  I'm not familiar enough with hypervisor userspace
to really know if this would work, but:

What if shared guest memory could also be file-backed, either in the
same fd or with a second fd covering the shared portion of a memslot?
This would allow changes to the backing store (punching holes, etc) to
be some without mmap_lock or host-userspace TLB flushes?  Depending on
what the guest is doing with its shared memory, userspace might need
the memory mapped or it might not.

That's what I'm angling for with the F_SEAL_FAULT_ALLOCATIONS idea.  The issue,
unless I'm misreading code, is that punching a hole in the shared memory backing
store doesn't prevent reallocating that hole on fault, i.e. a helper process that
keeps a valid mapping of guest shared memory can silently fill the hole.

What we're hoping to achieve is a way to prevent allocating memory without a very
explicit action from userspace, e.g. fallocate().

Ah, I misunderstood.  I thought your goal was to mmap it and prevent
page faults from allocating.

It is indeed the case (and has been since before quite a few of us
were born) that a hole in a sparse file is logically just a bunch of
zeros.  A way to make a file for which a hole is an actual hole seems
like it would solve this problem nicely.  It could also be solved more
specifically for KVM by making sure that the private/shared mode that
userspace programs is strict enough to prevent accidental allocations
-- if a GPA is definitively private, shared, neither, or (potentially,
on TDX only) both, then a page that *isn't* shared will never be
accidentally allocated by KVM.  If the shared backing is not mmapped,
it also won't be accidentally allocated by host userspace on a stray
or careless write.


--Andy

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help