Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory
From: Chao Peng <hidden>
Date: 2022-10-10 08:29:58
Also in:
kvm, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
On Sat, Oct 08, 2022 at 08:35:47PM +0300, Jarkko Sakkinen wrote:
On Sat, Oct 08, 2022 at 07:15:17PM +0300, Jarkko Sakkinen wrote:quoted
On Sat, Oct 08, 2022 at 12:54:32AM +0300, Jarkko Sakkinen wrote:quoted
On Fri, Oct 07, 2022 at 02:58:54PM +0000, Sean Christopherson wrote:quoted
On Fri, Oct 07, 2022, Jarkko Sakkinen wrote:quoted
On Thu, Oct 06, 2022 at 03:34:58PM +0000, Sean Christopherson wrote:quoted
On Thu, Oct 06, 2022, Jarkko Sakkinen wrote:quoted
On Thu, Oct 06, 2022 at 05:58:03PM +0300, Jarkko Sakkinen wrote:quoted
On Thu, Sep 15, 2022 at 10:29:07PM +0800, Chao Peng wrote:quoted
This new extension, indicated by the new flag KVM_MEM_PRIVATE, adds two additional KVM memslot fields private_fd/private_offset to allow userspace to specify that guest private memory provided from the private_fd and guest_phys_addr mapped at the private_offset of the private_fd, spanning a range of memory_size. The extended memslot can still have the userspace_addr(hva). When use, a single memslot can maintain both private memory through private fd(private_fd/private_offset) and shared memory through hva(userspace_addr). Whether the private or shared part is visible to guest is maintained by other KVM code.What is anyway the appeal of private_offset field, instead of having just 1:1 association between regions and files, i.e. one memfd per region?Modifying memslots is slow, both in KVM and in QEMU (not sure about Google's VMM). E.g. if a vCPU converts a single page, it will be forced to wait until all other vCPUs drop SRCU, which can have severe latency spikes, e.g. if KVM is faulting in memory. KVM's memslot updates also hold a mutex for the entire duration of the update, i.e. conversions on different vCPUs would be fully serialized, exacerbating the SRCU problem. KVM also has historical baggage where it "needs" to zap _all_ SPTEs when any memslot is deleted. Taking both a private_fd and a shared userspace address allows userspace to convert between private and shared without having to manipulate memslots.Right, this was really good explanation, thank you. Still wondering could this possibly work (or not): 1. Union userspace_addr and private_fd.No, because userspace needs to be able to provide both userspace_addr (shared memory) and private_fd (private memory) for a single memslot.Got it, thanks for clearing my misunderstandings on this topic, and it is quite obviously visible in 5/8 and 7/8. I.e. if I got it right, memblock can be partially private, and you dig the shared holes with KVM_MEMORY_ENCRYPT_UNREG_REGION. We have (in Enarx) ATM have memblock per host mmap, I was looking into this dilated by that mindset but makes definitely sense to support that.For me the most useful reference with this feature is kvm_set_phys_mem() implementation in privmem-v8 branch. Took while to find it because I did not have much experience with QEMU code base. I'd even recommend to mention that function in the cover letter because it is really good reference on how this feature is supposed to be used.
That's a good point, I can mention that if people find useful.
While learning QEMU code, I also noticed bunch of comparison like this: if (slot->flags | KVM_MEM_PRIVATE) I guess those could be just replaced with unconditional fills as it does not do any harm, if KVM_MEM_PRIVATE is not set.
Make sense, thanks. Chao
BR, Jarkko