Thread (58 messages) 58 messages, 10 authors, 2022-08-25

Re: [PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory

From: Sean Christopherson <seanjc@google.com>
Date: 2022-06-14 19:09:21
Also in: kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Tue, Jun 14, 2022, Andy Lutomirski wrote:
On Tue, Jun 14, 2022 at 12:32 AM Chao Peng [off-list ref] wrote:
quoted
On Thu, Jun 09, 2022 at 08:29:06PM +0000, Sean Christopherson wrote:
quoted
On Wed, Jun 08, 2022, Vishal Annapurve wrote:

One argument is that userspace can simply rely on cgroups to detect misbehaving
guests, but (a) those types of OOMs will be a nightmare to debug and (b) an OOM
kill from the host is typically considered a _host_ issue and will be treated as
a missed SLO.

An idea for handling this in the kernel without too much complexity would be to
add F_SEAL_FAULT_ALLOCATIONS (terrible name) that would prevent page faults from
allocating pages, i.e. holes can only be filled by an explicit fallocate().  Minor
faults, e.g. due to NUMA balancing stupidity, and major faults due to swap would
still work, but writes to previously unreserved/unallocated memory would get a
SIGSEGV on something it has mapped.  That would allow the userspace VMM to prevent
unintentional allocations without having to coordinate unmapping/remapping across
multiple processes.
Since this is mainly for shared memory and the motivation is catching
misbehaved access, can we use mprotect(PROT_NONE) for this? We can mark
those range backed by private fd as PROT_NONE during the conversion so
subsequence misbehaved accesses will be blocked instead of causing double
allocation silently.
PROT_NONE, a.k.a. mprotect(), has the same vma downsides as munmap().
 
This patch series is fairly close to implementing a rather more
efficient solution.  I'm not familiar enough with hypervisor userspace
to really know if this would work, but:

What if shared guest memory could also be file-backed, either in the
same fd or with a second fd covering the shared portion of a memslot?
This would allow changes to the backing store (punching holes, etc) to
be some without mmap_lock or host-userspace TLB flushes?  Depending on
what the guest is doing with its shared memory, userspace might need
the memory mapped or it might not.
That's what I'm angling for with the F_SEAL_FAULT_ALLOCATIONS idea.  The issue,
unless I'm misreading code, is that punching a hole in the shared memory backing
store doesn't prevent reallocating that hole on fault, i.e. a helper process that
keeps a valid mapping of guest shared memory can silently fill the hole.

What we're hoping to achieve is a way to prevent allocating memory without a very
explicit action from userspace, e.g. fallocate().
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help