Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM
From: Kirill A. Shutemov <hidden>
Date: 2023-01-25 13:01:27
Also in:
kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
Subsystem:
memory management, the rest · Maintainers:
Andrew Morton, Linus Torvalds
On Wed, Jan 25, 2023 at 12:20:26AM +0000, Sean Christopherson wrote:
On Tue, Jan 24, 2023, Liam Merwick wrote:quoted
On 14/01/2023 00:37, Sean Christopherson wrote:quoted
On Fri, Dec 02, 2022, Chao Peng wrote:quoted
This patch series implements KVM guest private memory for confidential computing scenarios like Intel TDX[1]. If a TDX host accesses TDX-protected guest memory, machine check can happen which can further crash the running host system, this is terrible for multi-tenant configurations. The host accesses include those from KVM userspace like QEMU. This series addresses KVM userspace induced crash by introducing new mm and KVM interfaces so KVM userspace can still manage guest memory via a fd-based approach, but it can never access the guest memory content. The patch series touches both core mm and KVM code. I appreciate Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other reviews are always welcome. - 01: mm change, target for mm tree - 02-09: KVM change, target for KVM treeA version with all of my feedback, plus reworked versions of Vishal's selftest, is available here: git@github.com:sean-jc/linux.git x86/upm_base_support It compiles and passes the selftest, but it's otherwise barely tested. There are a few todos (2 I think?) and many of the commits need changelogs, i.e. it's still a WIP.When running LTP (https://github.com/linux-test-project/ltp) on the v10 bits (and also with Sean's branch above) I encounter the following NULL pointer dereference with testcases/kernel/syscalls/madvise/madvise01 (100% reproducible). It appears that in restrictedmem_error_page() inode->i_mapping->private_data is NULL in the list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) but I don't know why.Kirill, can you take a look? Or pass the buck to someone who can? :-)
The patch below should help.
diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 15c52301eeb9..39ada985c7c0 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c@@ -307,14 +307,29 @@ void restrictedmem_error_page(struct page *page, struct address_space *mapping) spin_lock(&sb->s_inode_list_lock); list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { - struct restrictedmem *rm = inode->i_mapping->private_data; struct restrictedmem_notifier *notifier; - struct file *memfd = rm->memfd; + struct restrictedmem *rm; unsigned long index; + struct file *memfd; - if (memfd->f_mapping != mapping) + if (atomic_read(&inode->i_count)) continue; + spin_lock(&inode->i_lock); + if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) { + spin_unlock(&inode->i_lock); + continue; + } + + rm = inode->i_mapping->private_data; + memfd = rm->memfd; + + if (memfd->f_mapping != mapping) { + spin_unlock(&inode->i_lock); + continue; + } + spin_unlock(&inode->i_lock); + xa_for_each_range(&rm->bindings, index, notifier, start, end) notifier->ops->error(notifier, start, end); break;
--
Kiryl Shutsemau / Kirill A. Shutemov