Thread (62 messages) 62 messages, 5 authors, 2021-04-21

Re: [RFC Part2 PATCH 07/30] mm: add support to split the large THP based on RMP violation

From: Brijesh Singh <hidden>
Date: 2021-04-21 13:44:14
Also in: linux-crypto, lkml

On 4/21/21 7:59 AM, Vlastimil Babka wrote:
On 3/25/21 4:59 PM, Dave Hansen wrote:
quoted
On 3/25/21 8:24 AM, Brijesh Singh wrote:
quoted
On 3/25/21 9:48 AM, Dave Hansen wrote:
quoted
On 3/24/21 10:04 AM, Brijesh Singh wrote:
quoted
When SEV-SNP is enabled globally in the system, a write from the hypervisor
can raise an RMP violation. We can resolve the RMP violation by splitting
the virtual address to a lower page level.

e.g
- guest made a page shared in the RMP entry so that the hypervisor
  can write to it.
- the hypervisor has mapped the pfn as a large page. A write access
  will cause an RMP violation if one of the pages within the 2MB region
  is a guest private page.

The above RMP violation can be resolved by simply splitting the large
page.
What if the large page is provided by hugetlbfs?
I was not able to find a method to split the large pages in the
hugetlbfs. Unfortunately, at this time a VMM cannot use the backing
memory from the hugetlbfs pool. An SEV-SNP aware VMM can use either
transparent hugepage or small pages.
That's really, really nasty.  Especially since it might not be evident
until long after boot and the guest is killed.
I'd assume a SNP-aware QEMU would be needed in the first place and thus this
QEMU would know not to use hugetlbfs?

Yes, that is correct. Qemu patches will not launch SEV-SNP guest when
hugetlbfs is used. I can also look to add the check in kernel to ensure
that backing pages does not come from the hugetlbfs so that non-QEMU VMM
will also fail to create the SNP guest.
quoted
It's even nastier because hugetlbfs is actually a great fit for SEV-SNP
memory.  It's physically contiguous, so it would keep you from having to
Maybe this could be solvable by remapping the hugetlbfs page with pte's when
needed (a guest wants to share 4k out of 2MB with the host temporarily). But
certainly never as flexibly as pte-mapped THP's as the complexity of that
(refcounting tail pages etc) is significant.
quoted
fracture the direct map all the way down to 4k, it also can't be
reclaimed (just like all SEV memory).
About that... the whitepaper I've seen [1] mentions support for swapping guest
pages. I'd expect the same mechanism could be used for their migration -
scattering 4kB unmovable SEV pages around would be terrible for fragmentation. I
assume neither swap or migration support is part of the patchset(s) yet?

Yes, the patches does not support swapping guest pages yet. We want to
add the support incrementally. The swap/move can be implemented after we
have the base enabled in the kernel. Both the SEV and SNP firmware
provides PSP commands that can be used to swap the guest pages. I
believe KVM mmu notifier can use it during the page move.
quoted
I think the minimal thing you can do here is to fail to add memory to
the RMP in the first place if you can't split it.  That way, users will
at least fail to _start_ their VM versus dying randomly for no good reason.

Even better would be to come up with a stronger contract between host
and guest.  I really don't think the host should be exposed to random
RMP faults on the direct map.  If the guest wants to share memory, then
it needs to tell the host and give the host an opportunity to move the
guest physical memory.  It might, for instance, sequester all the shared
pages in a single spot to minimize direct map fragmentation.
Agreed, and the contract should be elaborated before going to implementation
details (patches). Could a malicious guest violate such contract unilaterally? I
guess not, because psmash is a hypervisor instruction? And if yes, the
RMP-specific page fault handlers would be used just to kill such guest, not to
fix things up during page fault.
The version 2 of GHCB specification defines a contract between the guest
and the host. When guest is ready to share a page with the host it
issues the page state change request to the hypervisor. Hypervisor is
responsible to add the page in the RMP table using the RMPUPDATE
instruction. The page state change request include an operation field.
The operation can be one of the following

1. Add page in RMP table (make guest page private)

2. Remove page from RMP table (make guest page shared)

3. Psmash - split the large RMP entry

4. Unmash - merge small RMP entry into large. The unmash operation
require the PSP assist.

The current RMP-specific fault handler checks if host is attempting to
write to a guest private page. If so, kill the guest. I guess it covers
the case where a malicious guest violates the contract to issue the
page-state-change.

quoted
I'll let the other x86 folks chime in on this, but I really think this
needs a different approach than what's being proposed.
Not an x86 folk, but agreed :)

[1]
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2FSEV-SNP-strengthening-vm-isolation-with-integrity-protection-and-more.pdf&amp;data=04%7C01%7Cbrijesh.singh%40amd.com%7C3a8c99a1738940b550af08d904c55938%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637546068243853651%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=x%2Bmtud8IxrykFCPAPgBu2CCAFO9Q26PA3OhryvlX%2BbM%3D&amp;reserved=0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help