Re: Runtime Memory Validation in Intel-TDX and AMD-SNP
From: Kirill A. Shutemov <hidden>
Date: 2021-07-21 13:39:57
Also in:
linux-mm
On Wed, Jul 21, 2021 at 12:20:17PM +0300, Mike Rapoport wrote:
On Tue, Jul 20, 2021 at 08:30:04PM +0300, Kirill A. Shutemov wrote:quoted
On Mon, Jul 19, 2021 at 02:58:22PM +0200, Joerg Roedel wrote:quoted
Hi, I'd like to get some movement again into the discussion around how to implement runtime memory validation for confidential guests and wrote up some thoughts on it. Below are the results in form of a proposal I put together. Please let me know your thoughts on it and whether it fits everyones requirements.Thanks for bringing it up. I'm working on the topic for Intel TDX. See comments below.quoted
Thanks, Joerg Proposal for Runtime Memory Validation in Secure Guests on x86 ==============================================================[ snip ]quoted
quoted
8. When memory is returned to the memblock or page allocators, it is _not_ invalidated. In fact, all memory which is freed need to be valid. If it was marked invalid in the meantime (e.g. if it the memory was used for DMA buffers), the code owning the memory needs to validate it again before freeing it. The benefit of doing memory validation at allocation time is that it keeps the exception handler for invalid memory simple, because no exceptions of this kind are expected under normal operation.During early boot I treat unaccepted memory as a usable RAM. It only requires special treatment on memblock_reserve(), which used for early memory allocation: unaccepted usable RAM has to be accepted, before reserving.memblock_reserve() is not always used for early allocations and some of the early allocations on x86 don't use memblock at all.
Do you mean any codepath in particular?
Hooking validation/acceptance to memblock_reserve() should be fine for PoC but I suspect there will be caveats for production.
That's why I do PoC. Will see. So far so good. Maybe it will be visible with smaller pre-accepted memory size.
quoted
For fine-grained accepting/validation tracking I use PageOffline() flags (it's encoded into mapcount): before adding an unaccepted page to free list I set the PageOffline() to indicate that the page has to be accepted before returning from the page allocator. Currently, we never have PageOffline() set for pages on free lists, so we won't have confusion with ballooning or memory hotplug. I try to keep pages accepted in 2M or 4M chunks (pageblock_order or MAX_ORDER). It is reasonable compromise on speed/latency.Keeping fine grained accepting/validation information in the memory map means it cannot be reused across reboots/kexec and there should be an additional data structure to carry this information. It could be the same structure that is used by firmware to inform kernel about usable memory, just it needs to live after boot and get updates about new (in)validations. Doing those in 2M/4M chunks will help to prevent this structure from exploding.
Yeah, we would need to reconstruct the EFI map somehow. Or we can give most of memory back to the host and accept/validate the memory again after reboot/kexec. I donno.
BTW, as Dave mentioned, the deferred struct page init can also take care of the validation.
That was my first thought too and I tried it just to realize that it is not what we want. If we would accept page on page struct init it means we would make host allocate all memory assigned to the guest on boot even if guest actually use small portion of it. Also deferred page init only allows to scale validation across multiple CPUs, but doesn't allow to get to userspace before we done with it. See wait_for_completion(&pgdat_init_all_done_comp). -- Kirill A. Shutemov