Re: [RFC] KVM: mm: fd-based approach for supporting KVM guest private memory
From: Sean Christopherson <seanjc@google.com>
Date: 2021-08-31 20:45:23
Also in:
kvm, linux-coco, lkml
On Tue, Aug 31, 2021, David Hildenbrand wrote:
On 28.08.21 00:28, Sean Christopherson wrote:quoted
On Fri, Aug 27, 2021, Andy Lutomirski wrote:quoted
On Thu, Aug 26, 2021, at 2:26 PM, David Hildenbrand wrote:quoted
On 26.08.21 19:05, Andy Lutomirski wrote:quoted
quoted
Oof. That's quite a requirement. What's the point of the VMA once all this is done?You can keep using things like mbind(), madvise(), ... and the GUP code with a special flag might mostly just do what you want. You won't have to reinvent too many wheels on the page fault logic side at least.Ya, Kirill's RFC more or less proved a special GUP flag would indeed Just Work. However, the KVM page fault side of things would require only a handful of small changes to send private memslots down a different path. Compared to the rest of the enabling, it's quite minor. The counter to that is other KVM architectures would need to learn how to use the new APIs, though I suspect that there will be a fair bit of arch enabling regardless of what route we take.quoted
You can keep calling the functions. The implementations working is a different story: you can't just unmap (pte_numa-style or otherwise) a private guest page to quiesce it, move it with memcpy(), and then fault it back in.Ya, I brought this up in my earlier reply. Even the initial implementation (without real NUMA support) would likely be painful, e.g. the KVM TDX RFC/PoC adds dedicated logic in KVM to handle the case where NUMA balancing zaps a _pinned_ page and then KVM fault in the same pfn. It's not thaaat ugly, but it's arguably more invasive to KVM's page fault flows than a new fd-based private memslot scheme.I might have a different mindset, but less code churn doesn't necessarily translate to "better approach".
I wasn't referring to code churn. By "invasive" I mean number of touchpoints in KVM as well as the nature of the touchpoints. E.g. poking into how KVM uses available bits in its shadow PTEs and adding multiple checks through KVM's page fault handler, versus two callbacks to get the PFN and page size.
I'm certainly not pushing for what I proposed (it's a rough, broken sketch). I'm much rather trying to come up with alternatives that try solving the same issue, handling the identified requirements. I have a gut feeling that the list of requirements might not be complete yet. For example, I wonder if we have to protect against user space replacing private pages by shared pages or punishing random holes into the encrypted memory fd.
Replacing a private page with a shared page for a given GFN is very much a requirement as it's expected behavior for all VMM+guests when converting guest memory between shared and private. Punching holes is a sort of optional requirement. It's a "requirement" in that it's allowed if the backing store supports such a behavior, optional in that support wouldn't be strictly necessary and/or could come with constraints. The expected use case is that host userspace would punch a hole to free unreachable private memory, e.g. after the corresponding GFN(s) is converted to shared, so that it doesn't consume 2x memory for the guest.