Thread (83 messages) 83 messages, 11 authors, 2026-02-05

Re: [RFC PATCH v1 01/37] KVM: guest_memfd: Introduce per-gmem attributes, use to guard user mappings

From: Yan Zhao <hidden>
Date: 2026-01-19 08:03:24
Also in: cgroups, kvm, linux-doc, linux-fsdevel, linux-kselftest, linux-mm, lkml

On Fri, Oct 17, 2025 at 01:11:42PM -0700, Ackerley Tng wrote:
Use the filemap invalidation lock to protect the maple tree, as taking the
lock for read when faulting in memory (for userspace or the guest) isn't
expected to result in meaningful contention, and using a separate lock
would add significant complexity (avoid deadlock is quite difficult).
... 
+static u64 kvm_gmem_get_attributes(struct inode *inode, pgoff_t index)
+{
+	void *entry = mtree_load(&GMEM_I(inode)->attributes, index);
+
+	return WARN_ON_ONCE(!entry) ? 0 : xa_to_value(entry);
+}
It looks like kvm_gmem_get_attributes() is possible to be invoked outside the
protection of inode->i_mapping->invalidate_lock.

kvm_tdp_mmu_page_fault
    kvm_mmu_faultin_pfn(vcpu, fault, ACC_ALL)
        kvm_mem_is_private(kvm, fault->gfn)
            kvm_get_memory_attributes
	        kvm_gmem_get_memory_attributes
		    kvm_gmem_get_attributes
+static int kvm_gmem_init_inode(struct inode *inode, loff_t size, u64 flags)
+{
+	struct gmem_inode *gi = GMEM_I(inode);
+	MA_STATE(mas, &gi->attributes, 0, (size >> PAGE_SHIFT) - 1);
+	u64 attrs;
+	int r;
+
+	inode->i_op = &kvm_gmem_iops;
+	inode->i_mapping->a_ops = &kvm_gmem_aops;
+	inode->i_mode |= S_IFREG;
+	inode->i_size = size;
+	mapping_set_gfp_mask(inode->i_mapping, GFP_HIGHUSER);
+	mapping_set_inaccessible(inode->i_mapping);
+	/* Unmovable mappings are supposed to be marked unevictable as well. */
+	WARN_ON_ONCE(!mapping_unevictable(inode->i_mapping));
+
+	gi->flags = flags;
+
+	mt_set_external_lock(&gi->attributes,
+			     &inode->i_mapping->invalidate_lock);
+
But here, it's declared that mtree is protected by the invalidate_lock.
+	/*
+	 * Store default attributes for the entire gmem instance. Ensuring every
+	 * index is represented in the maple tree at all times simplifies the
+	 * conversion and merging logic.
+	 */
+	attrs = gi->flags & GUEST_MEMFD_FLAG_INIT_SHARED ? 0 : KVM_MEMORY_ATTRIBUTE_PRIVATE;
+
+	/*
+	 * Acquire the invalidation lock purely to make lockdep happy. There
+	 * should be no races at this time since the inode hasn't yet been fully
+	 * created.
+	 */
+	filemap_invalidate_lock(inode->i_mapping);
+	r = mas_store_gfp(&mas, xa_mk_value(attrs), GFP_KERNEL);
+	filemap_invalidate_unlock(inode->i_mapping);
+
+	return r;
+}
+
quoted hunk ↗ jump to hunk
@@ -925,13 +986,39 @@ static struct inode *kvm_gmem_alloc_inode(struct super_block *sb)

 	mpol_shared_policy_init(&gi->policy, NULL);

+	/*
+	 * Memory attributes are protected the filemap invalidation lock, but
+	 * the lock structure isn't available at this time.  Immediately mark
+	 * maple tree as using external locking so that accessing the tree
+	 * before its fully initialized results in NULL pointer dereferences
+	 * and not more subtle bugs.
+	 */
+	mt_init_flags(&gi->attributes, MT_FLAGS_LOCK_EXTERN);
And here.
 	gi->flags = 0;
 	return &gi->vfs_inode;
 }
So, it's possible for kvm_mem_is_private() to access invalid mtree data and hit
the WARN_ON_ONCE() in kvm_gmem_get_attributes().

I reported a similar error in [*].

[*] https://lore.kernel.org/all/aIwD5kGbMibV7ksk@yzhao56-desk.sh.intel.com (local)
 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help