Thread (97 messages) 97 messages, 14 authors, 2022-11-03

Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

From: Sean Christopherson <seanjc@google.com>
Date: 2022-09-27 22:47:40
Also in: kvm, linux-api, linux-fsdevel, linux-mm, lkml, qemu-devel

On Mon, Sep 26, 2022, Fuad Tabba wrote:
Hi,

On Mon, Sep 26, 2022 at 3:28 PM Chao Peng [off-list ref] wrote:
quoted
On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote:
quoted
quoted
Then on the KVM side, its mmap_start() + mmap_end() sequence would:

  1. Not be supported for TDX or SEV-SNP because they don't allow adding non-zero
     memory into the guest (after pre-boot phase).

  2. Be mutually exclusive with shared<=>private conversions, and is allowed if
     and only if the entire gfn range of the associated memslot is shared.
In general I think that this would work with pKVM. However, limiting
private<->shared conversions to the granularity of a whole memslot
might be difficult to handle in pKVM, since the guest doesn't have the
concept of memslots. For example, in pKVM right now, when a guest
shares back its restricted DMA pool with the host it does so at the
page-level.
Y'all are killing me :-)

Isn't the guest enlightened?  E.g. can't you tell the guest "thou shalt share at
granularity X"?  With KVM's newfangled scalable memslots and per-vCPU MRU slot,
X doesn't even have to be that high to get reasonable performance, e.g. assuming
the DMA pool is at most 2GiB, that's "only" 1024 memslots, which is supposed to
work just fine in KVM.
quoted
quoted
pKVM would also need a way to make an fd accessible again
when shared back, which I think isn't possible with this patch.
But does pKVM really want to mmap/munmap a new region at the page-level,
that can cause VMA fragmentation if the conversion is frequent as I see.
Even with a KVM ioctl for mapping as mentioned below, I think there will
be the same issue.
pKVM doesn't really need to unmap the memory. What is really important
is that the memory is not GUP'able.
Well, not entirely unguppable, just unguppable without a magic FOLL_* flag,
otherwise KVM wouldn't be able to get the PFN to map into guest memory.

The problem is that gup() and "mapped" are tied together.  So yes, pKVM doesn't
strictly need to unmap memory _in the untrusted host_, but since mapped==guppable,
the end result is the same.

Emphasis above because pKVM still needs unmap the memory _somehwere_.  IIUC, the
current approach is to do that only in the stage-2 page tables, i.e. only in the
context of the hypervisor.  Which is also the source of the gup() problems; the
untrusted kernel is blissfully unaware that the memory is inaccessible.

Any approach that moves some of that information into the untrusted kernel so that
the kernel can protect itself will incur fragmentation in the VMAs.  Well, unless
all of guest memory becomes unguppable, but that's likely not a viable option.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help