Thread (37 messages) 37 messages, 13 authors, 2021-08-02

Re: Runtime Memory Validation in Intel-TDX and AMD-SNP

From: Kirill A. Shutemov <hidden>
Date: 2021-07-21 14:04:57
Also in: linux-mm

On Wed, Jul 21, 2021 at 11:25:25AM +0200, Joerg Roedel wrote:
Hi Kirill,

On Tue, Jul 20, 2021 at 08:30:04PM +0300, Kirill A. Shutemov wrote:
quoted
On Mon, Jul 19, 2021 at 02:58:22PM +0200, Joerg Roedel wrote:
We use EFI unaccepted memory type to pass this information between
firmware and kernel. In my WIP patch I translate it to a new E820 memory
type: E820_TYPE_UNACCEPTED.
Yeah, that is what I meant with a new E820 entry type.
quoted
E820 can also be used during early boot for tracking what memory got
accepted by kernel too.
Won't this get very fragmented? How do you handle overlaps with other
E820 regions?
I modify E820 as needed:

	e820__range_update(start, end, E820_TYPE_UNACCEPTED, E820_TYPE_RAM);

I also ask memblock for bottom-up allocation as it helps with using
per-accepted pages first and reduces fragmentation:

	memblock_set_bottom_up(true);
quoted
For now, I debug with 256MiB accepted by firmware. It allows to avoid
dealing with decompression code at this stage of the project. I plan to
lower the number later.
Yes, this can be experimented with, the proposal allows a custom amount
of memory to be pre-validated/accepted.
quoted
I would argue for per-range, not per-page, tracking of accepted/validated
memory for decompresser and early boot code, until page allocator is fully
functional. I have reasonable success with this approach so far.
What do you mean by 'reasonable' success?
It appears to work fine with 256MiB of pre-accepted memory, but more
testing is required.
Especially, how robust is that against unrelated changes to the boot
code? As with SEV-SNP, I guess there will be no broad testing of
unrelated kernel changes in a TDX environment, so some robustness is key
to keep things working.
Hard to say. Let me get the prototype functional first. It's easier to
discuss with code on hands.
quoted
During early boot I treat unaccepted memory as a usable RAM. It only
requires special treatment on memblock_reserve(), which used for early
memory allocation: unaccepted usable RAM has to be accepted, before
reserving.
What happens before memblock is active, say in the decompressor. Will
unaccepted memory be considered for KASLR placement?
I tried to postpone thinking about decompresser as long as possible :P

I guess we need pass down information about memory accepted in
decompresser to the main kernel so it can record in E820. I think it will
a single range.
quoted
For fine-grained accepting/validation tracking I use PageOffline() flags
(it's encoded into mapcount): before adding an unaccepted page to free
list I set the PageOffline() to indicate that the page has to be accepted
before returning from the page allocator. Currently, we never have
PageOffline() set for pages on free lists, so we won't have confusion with
ballooning or memory hotplug.
Okay, I think that could also easily break with unrelated memory
management changes, but should work for now in TDX.
quoted
I try to keep pages accepted in 2M or 4M chunks (pageblock_order or
MAX_ORDER). It is reasonable compromise on speed/latency.
Makes sense, SEV-SNP will likely do something similar.
quoted
I'm not sure a bitmap is needed. I hope we can use E820 for early
tracking. But let's see if it works.
We should find a solution which works for TDX and SNP, given that the
required changes are intrusive and that it is much easier to just
support one way to handle this.

That said, the Validation Bitmap has a clear benefit for SEV-SNP in that
it makes it trivial to support kexec/kdump scenarios. Further the
bitmap makes it trivial to transport the information through the whole
boot process. It also won't be big, SNP (and I think TDX too) would
be okay with one bit per 4k page, so the bitmap would need 32kb of
memory per GB of guest RAM.
Yes, the bitmap is small, but it going to be rather hot structure. It has
to be consulted on every page allocation, right?

How to do plan to make bitmap scalable? What the locking rules around it?
And keeping the information separate from struct page will make the code
more robust against unrelated code changes.
-- 
 Kirill A. Shutemov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help