RE: [PATCH v3 00/14] arm64: Support for running as a guest in Arm CCA | linux-arm-kernel

quoted

From: Catalin Marinas <catalin.marinas@arm.com> Sent: Monday, June 10, 2024 3:34 AM
On Fri, Jun 07, 2024 at 04:36:18PM +0000, Michael Kelley wrote:
From: Catalin Marinas <catalin.marinas@arm.com> Sent: Friday, June 7, 2024 8:13 AM
On Fri, Jun 07, 2024 at 01:38:15AM +0000, Michael Kelley wrote:
In the case of a vmalloc() address, load_unaligned_zeropad() could still
make an access to the underlying pages through the linear address. In
CoCo guests on x86, both the vmalloc PTE and the linear map PTE are
flipped, so the load_unaligned_zeropad() problem can occur only during
the transition between decrypted and encrypted. But even then, the
exception handlers have code to fixup this case and allow everything to
proceed normally.

I haven't looked at the code in your patches, but do you handle that case,
or somehow prevent it?
If we can guarantee that only full a vm_struct area is changed at a
time, the vmap guard page would prevent this issue (not sure we can
though). Otherwise I think we either change the set_memory_*() code to
deal with the other mappings or we handle the exception.
I don't think the vmap guard pages help. The vmalloc() memory consists
of individual pages that are scattered throughout the direct map. The stray
reference from load_unaligned_zeropad() will originate in a kmalloc'ed
memory page that precedes one of these scattered individual pages, and
will use a direct map kernel vaddr.  So the guard page in vmalloc space don't
come into play. At least in the Hyper-V use case, an entire vmalloc allocation
*is* flipped as a unit, so the guard pages do prevent a stray reference from
load_unaligned_zeropad() that originates in vmalloc space. At one
point I looked to see if load_unaligned_zeropad() is ever used on vmalloc
addresses.  I think the answer was "no",  making the guard page question
moot, but I'm not sure. :-(
My point was about load_unaligned_zeropad() originating in the vmalloc
space. What I had in mind is changing the underlying linear map via
set_memory_*() while we have live vmalloc() mappings. But I forgot about
the case you mentioned in a previous thread: set_memory_*() being called
on vmalloc()'ed memory directly:

https://lore.kernel.org/all/SN6PR02MB41578D7BFEDE33BD2E8246EFD4E92@SN6PR02MB4157.namprd02.prod.outlook.com/ (local)
OK, right.  You and I were thinking about different cases.

I wonder whether something like __GFP_DECRYPTED could be used to get
shared memory from the allocation time and avoid having to change the
vmalloc() ranges. This way functions like netvsc_init_buf() would get
decrypted memory from the start and vmbus_establish_gpadl() would not
need to call set_memory_decrypted() on a vmalloc() address.
I would not have any conceptual objections to such an approach. But I'm
certainly not an expert in that area so I'm not sure what it would take
to make that work for vmalloc(). I presume that __GFP_DECRYPTED
should also work for kmalloc()?

I've seen the separate discussion about a designated pool of decrypted
memory, to avoid always allocating a new page and decrypting when a
smaller allocation is sufficient. If such a pool could also work for page size
or larger allocations, it would have the additional benefit of concentrating
decrypted allocations in fewer 2 Meg large pages vs. scattering wherever
and forcing the break-up of more large page mappings in the direct map.

I'll note that netvsc devices can be added or removed from a running VM.
The vmalloc() memory allocated by netvsc_init_buf() can be freed, and/or
additional calls to netvsc_init_buf() can be made at any time -- they aren't
limited to initial Linux boot.  So the mechanism for getting decrypted
memory at allocation time must be reasonably dynamic.

Another thought: The use of load_unaligned_zeropad() is conditional on
CONFIG_DCACHE_WORD_ACCESS. There are #ifdef'ed alternate
implementations that don't use load_unaligned_zeropad() if it is not
enabled. I looked at just disabling it in CoCo VMs, but I don't know the
performance impact. I speculated that the benefits were more noticeable
in processors from a decade or more ago, and perhaps less so now, but
never did any measurements. There was also a snag in that x86-only
code has a usage of load_unaligned_zeropad() without an alternate
implementation, so I never went fully down that path. But arm64 would
probably "just work" if it were disabled.
We shouldn't penalise the performance, especially as I expect a single
image to run both as a guest or a host. However, I think now the linear
map is handled correctly since we make the PTE invalid before making the
page shared and this would force the fault path through the one that
safely handles load_unaligned_zeropad(). Steven's patches also currently
reject non-linear-map addresses, I guess this would be a separate
addition.
Rejecting vmalloc() addresses may work for the moment -- I don't know
when CCA guests might be tried on Hyper-V.  The original SEV-SNP and TDX
work started that way as well. :-) Handling the vmalloc() case was added
later, though I think on x86 the machinery to also flip all the alias PTEs was
already mostly or completely in place, probably for other reasons. So
fixing the vmalloc() case was more about not assuming that the underlying
physical address range is contiguous. Instead, each page must be processed
independently, which was straightforward.

We also have potential user mappings, do we need to do anything about
them?
I'm unclear on the scenario here.  Would memory with a user mapping
ever be flipped between decrypted and encrypted while the user mapping
existed?
Maybe it doesn't matter. Do we expect it the underlying pages to be
flipped while live mappings other than the linear map exist? I assume
not, one would first allocate and configure the memory in the kernel
before some remap_pfn_range() to user with the appropriate pgprot.
Yes, for user space mappings I also assume not.

Michael

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help