Thread (51 messages) 51 messages, 10 authors, 2021-12-03

Re: [RFC v2 PATCH 01/13] mm/shmem: Introduce F_SEAL_GUEST

From: David Hildenbrand <hidden>
Date: 2021-11-19 15:39:22
Also in: linux-fsdevel, linux-mm, lkml, qemu-devel

On 19.11.21 16:19, Jason Gunthorpe wrote:
On Fri, Nov 19, 2021 at 09:47:27PM +0800, Chao Peng wrote:
quoted
From: "Kirill A. Shutemov" <redacted>

The new seal type provides semantics required for KVM guest private
memory support. A file descriptor with the seal set is going to be used
as source of guest memory in confidential computing environments such as
Intel TDX and AMD SEV.

F_SEAL_GUEST can only be set on empty memfd. After the seal is set
userspace cannot read, write or mmap the memfd.

Userspace is in charge of guest memory lifecycle: it can allocate the
memory with falloc or punch hole to free memory from the guest.

The file descriptor passed down to KVM as guest memory backend. KVM
register itself as the owner of the memfd via memfd_register_guest().

KVM provides callback that needed to be called on fallocate and punch
hole.

memfd_register_guest() returns callbacks that need be used for
requesting a new page from memfd.

Signed-off-by: Kirill A. Shutemov <redacted>
Signed-off-by: Chao Peng <redacted>
 include/linux/memfd.h      |  24 ++++++++
 include/linux/shmem_fs.h   |   9 +++
 include/uapi/linux/fcntl.h |   1 +
 mm/memfd.c                 |  33 +++++++++-
 mm/shmem.c                 | 123 ++++++++++++++++++++++++++++++++++++-
 5 files changed, 186 insertions(+), 4 deletions(-)
diff --git a/include/linux/memfd.h b/include/linux/memfd.h
index 4f1600413f91..ff920ef28688 100644
+++ b/include/linux/memfd.h
@@ -4,13 +4,37 @@
 
 #include <linux/file.h>
 
+struct guest_ops {
+	void (*invalidate_page_range)(struct inode *inode, void *owner,
+				      pgoff_t start, pgoff_t end);
+	void (*fallocate)(struct inode *inode, void *owner,
+			  pgoff_t start, pgoff_t end);
+};
+
+struct guest_mem_ops {
+	unsigned long (*get_lock_pfn)(struct inode *inode, pgoff_t offset,
+				      bool alloc, int *order);
+	void (*put_unlock_pfn)(unsigned long pfn);
+
+};
Ignoring confidential compute for a moment

If qmeu can put all the guest memory in a memfd and not map it, then
I'd also like to see that the IOMMU can use this interface too so we
can have VFIO working in this configuration.
In QEMU we usually want to (and must) be able to access guest memory
from user space, with the current design we wouldn't even be able to
temporarily mmap it -- which makes sense for encrypted memory only. The
corner case really is encrypted memory. So I don't think we'll see a
broad use of this feature outside of encrypted VMs in QEMU. I might be
wrong, most probably I am :)
As designed the above looks useful to import a memfd to a VFIO
container but could you consider some more generic naming than calling
this 'guest' ?
+1 the guest terminology is somewhat sob-optimal.
Along the same lines, to support fast migration, we'd want to be able
to send these things to the RDMA subsytem as well so we can do data
xfer. Very similar to VFIO.

Also, shouldn't this be two patches? F_SEAL is not really related to
these acessors, is it?

Apart from the special "encrypted memory" semantics, I assume nothing
speaks against allowing for mmaping these memfds, for example, for any
other VFIO use cases.

-- 
Thanks,

David / dhildenb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help