Thread (97 messages) 97 messages, 14 authors, 2022-11-03

Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory

From: Sean Christopherson <seanjc@google.com>
Date: 2022-10-07 14:59:23
Also in: kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:
On Thu, Oct 06, 2022 at 03:34:58PM +0000, Sean Christopherson wrote:
quoted
On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:
quoted
On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:
quoted
On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:
quoted
This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two
additional KVM memslot fields private_fd/private_offset to allow
userspace to specify that guest private memory provided from the
private_fd and guest_phys_addr mapped at the private_offset of the
private_fd, spanning a range of memory_size.

The extended memslot can still have the userspace_addr(hva). When use, a
single memslot can maintain both private memory through private
fd(private_fd/private_offset) and shared memory through
hva(userspace_addr). Whether the private or shared part is visible to
guest is maintained by other KVM code.
What is anyway the appeal of private_offset field, instead of having just
1:1 association between regions and files, i.e. one memfd per region?
Modifying memslots is slow, both in KVM and in QEMU (not sure about Google's VMM).
E.g. if a vCPU converts a single page, it will be forced to wait until all other
vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is faulting in
memory.  KVM's memslot updates also hold a mutex for the entire duration of the
update, i.e. conversions on different vCPUs would be fully serialized, exacerbating
the SRCU problem.

KVM also has historical baggage where it "needs" to zap _all_ SPTEs when any
memslot is deleted.

Taking both a private_fd and a shared userspace address allows userspace to convert
between private and shared without having to manipulate memslots.
Right, this was really good explanation, thank you.

Still wondering could this possibly work (or not):

1. Union userspace_addr and private_fd.
No, because userspace needs to be able to provide both userspace_addr (shared
memory) and private_fd (private memory) for a single memslot.
2. Instead of introducing private_offset, use guest_phys_addr as the
   offset.
No, because that would force userspace to use a single private_fd for all of guest
memory since it effectively means private_offset=0.  And userspace couldn't skip
over holes in guest memory, i.e. the size of the memfd would need to follow the
max guest gpa.  In other words, dropping private_offset could work, but it'd be
quite kludgy and not worth saving 8 bytes.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help