Thread (153 messages) 153 messages, 23 authors, 2023-05-23

Re: [PATCH v10 0/9] KVM: mm: fd-based approach for supporting KVM

From: Kirill A. Shutemov <hidden>
Date: 2023-01-25 13:01:27
Also in: kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
Subsystem: memory management, the rest · Maintainers: Andrew Morton, Linus Torvalds

On Wed, Jan 25, 2023 at 12:20:26AM +0000, Sean Christopherson wrote:
On Tue, Jan 24, 2023, Liam Merwick wrote:
quoted
On 14/01/2023 00:37, Sean Christopherson wrote:
quoted
On Fri, Dec 02, 2022, Chao Peng wrote:
quoted
This patch series implements KVM guest private memory for confidential
computing scenarios like Intel TDX[1]. If a TDX host accesses
TDX-protected guest memory, machine check can happen which can further
crash the running host system, this is terrible for multi-tenant
configurations. The host accesses include those from KVM userspace like
QEMU. This series addresses KVM userspace induced crash by introducing
new mm and KVM interfaces so KVM userspace can still manage guest memory
via a fd-based approach, but it can never access the guest memory
content.

The patch series touches both core mm and KVM code. I appreciate
Andrew/Hugh and Paolo/Sean can review and pick these patches. Any other
reviews are always welcome.
   - 01: mm change, target for mm tree
   - 02-09: KVM change, target for KVM tree
A version with all of my feedback, plus reworked versions of Vishal's selftest,
is available here:

   git@github.com:sean-jc/linux.git x86/upm_base_support

It compiles and passes the selftest, but it's otherwise barely tested.  There are
a few todos (2 I think?) and many of the commits need changelogs, i.e. it's still
a WIP.
When running LTP (https://github.com/linux-test-project/ltp) on the v10
bits (and also with Sean's branch above) I encounter the following NULL
pointer dereference with testcases/kernel/syscalls/madvise/madvise01
(100% reproducible).

It appears that in restrictedmem_error_page() inode->i_mapping->private_data
is NULL
in the list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list)
but I don't know why.
Kirill, can you take a look?  Or pass the buck to someone who can? :-)
The patch below should help.
diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
index 15c52301eeb9..39ada985c7c0 100644
--- a/mm/restrictedmem.c
+++ b/mm/restrictedmem.c
@@ -307,14 +307,29 @@ void restrictedmem_error_page(struct page *page, struct address_space *mapping)
 
 	spin_lock(&sb->s_inode_list_lock);
 	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
-		struct restrictedmem *rm = inode->i_mapping->private_data;
 		struct restrictedmem_notifier *notifier;
-		struct file *memfd = rm->memfd;
+		struct restrictedmem *rm;
 		unsigned long index;
+		struct file *memfd;
 
-		if (memfd->f_mapping != mapping)
+		if (atomic_read(&inode->i_count))
 			continue;
 
+		spin_lock(&inode->i_lock);
+		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+
+		rm = inode->i_mapping->private_data;
+		memfd = rm->memfd;
+
+		if (memfd->f_mapping != mapping) {
+			spin_unlock(&inode->i_lock);
+			continue;
+		}
+		spin_unlock(&inode->i_lock);
+
 		xa_for_each_range(&rm->bindings, index, notifier, start, end)
 			notifier->ops->error(notifier, start, end);
 		break;
-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help