Thread (101 messages) 101 messages, 16 authors, 2022-12-02

Re: [PATCH v9 0/8] KVM: mm: fd-based approach for supporting KVM

From: Isaku Yamahata <hidden>
Date: 2022-11-08 00:41:49
Also in: kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Thu, Nov 03, 2022 at 05:43:52PM +0530,
Vishal Annapurve [off-list ref] wrote:
On Tue, Oct 25, 2022 at 8:48 PM Chao Peng [off-list ref] wrote:
quoted
This patch series implements KVM guest private memory for confidential
computing scenarios like Intel TDX[1]. If a TDX host accesses
TDX-protected guest memory, machine check can happen which can further
crash the running host system, this is terrible for multi-tenant
configurations. The host accesses include those from KVM userspace like
QEMU. This series addresses KVM userspace induced crash by introducing
new mm and KVM interfaces so KVM userspace can still manage guest memory
via a fd-based approach, but it can never access the guest memory
content.

The patch series touches both core mm and KVM code. I appreciate
Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other
reviews are always welcome.
  - 01: mm change, target for mm tree
  - 02-08: KVM change, target for KVM tree

Given KVM is the only current user for the mm part, I have chatted with
Paolo and he is OK to merge the mm change through KVM tree, but
reviewed-by/acked-by is still expected from the mm people.

The patches have been verified in Intel TDX environment, but Vishal has
done an excellent work on the selftests[4] which are dedicated for this
series, making it possible to test this series without innovative
hardware and fancy steps of building a VM environment. See Test section
below for more info.


Introduction
============
KVM userspace being able to crash the host is horrible. Under current
KVM architecture, all guest memory is inherently accessible from KVM
userspace and is exposed to the mentioned crash issue. The goal of this
series is to provide a solution to align mm and KVM, on a userspace
inaccessible approach of exposing guest memory.

Normally, KVM populates secondary page table (e.g. EPT) by using a host
virtual address (hva) from core mm page table (e.g. x86 userspace page
table). This requires guest memory being mmaped into KVM userspace, but
this is also the source where the mentioned crash issue can happen. In
theory, apart from those 'shared' memory for device emulation etc, guest
memory doesn't have to be mmaped into KVM userspace.

This series introduces fd-based guest memory which will not be mmaped
into KVM userspace. KVM populates secondary page table by using a
With no mappings in place for userspace VMM, IIUC, looks like the host
kernel will not be able to find the culprit userspace process in case
of Machine check error on guest private memory. As implemented in
hwpoison_user_mappings, host kernel tries to look at the processes
which have mapped the pfns with hardware error.

Is there a modification needed in mce handling logic of the host
kernel to immediately send a signal to the vcpu thread accessing
faulting pfn backing guest private memory?
mce_register_decode_chain() can be used.  MCE physical address(p->mce_addr)
includes host key id in addition to real physical address.  By searching used
hkid by KVM, we can determine if the page is assigned to guest TD or not. If
yes, send SIGBUS.

kvm_machine_check() can be enhanced for KVM specific use.  This is before
memory_failure() is called, though.

any other ideas?
-- 
Isaku Yamahata [off-list ref]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help