Thread (164 messages) 164 messages, 20 authors, 2023-05-25

Re: [PATCH v7 00/14] KVM: mm: fd-based approach for supporting KVM guest private memory

From: Sean Christopherson <seanjc@google.com>
Date: 2022-08-19 00:20:38
Also in: kvm, linux-doc, linux-fsdevel, linux-kselftest, linux-mm, lkml, qemu-devel

On Thu, Aug 18, 2022, Kirill A . Shutemov wrote:
On Wed, Aug 17, 2022 at 10:40:12PM -0700, Hugh Dickins wrote:
quoted
On Wed, 6 Jul 2022, Chao Peng wrote:
But since then, TDX in particular has forced an effort into preventing
(by flags, seals, notifiers) almost everything that makes it shmem/tmpfs.

Are any of the shmem.c mods useful to existing users of shmem.c? No.
Is MFD_INACCESSIBLE useful or comprehensible to memfd_create() users? No.
But QEMU and other VMMs are users of shmem and memfd.  The new features certainly
aren't useful for _all_ existing users, but I don't think it's fair to say that
they're not useful for _any_ existing users.
quoted
What use do you have for a filesystem here?  Almost none.
IIUC, what you want is an fd through which QEMU can allocate kernel
memory, selectively free that memory, and communicate fd+offset+length
to KVM.  And perhaps an interface to initialize a little of that memory
from a template (presumably copied from a real file on disk somewhere).

You don't need shmem.c or a filesystem for that!

If your memory could be swapped, that would be enough of a good reason
to make use of shmem.c: but it cannot be swapped; and although there
are some references in the mailthreads to it perhaps being swappable
in future, I get the impression that will not happen soon if ever.

If your memory could be migrated, that would be some reason to use
filesystem page cache (because page migration happens to understand
that type of memory): but it cannot be migrated.
Migration support is in pipeline. It is part of TDX 1.5 [1]. 
And this isn't intended for just TDX (or SNP, or pKVM).  We're not _that_ far off
from being able to use UPM for "regular" VMs as a way to provide defense-in-depth
without having to take on the overhead of confidential VMs.  At that point,
migration and probably even swap are on the table.
And swapping theoretically possible, but I'm not aware of any plans as of
now.
Ya, I highly doubt confidential VMs will ever bother with swap.
quoted
I'm afraid of the special demands you may make of memory allocation
later on - surprised that huge pages are not mentioned already;
gigantic contiguous extents? secretmem removed from direct map?
The design allows for extension to hugetlbfs if needed. Combination of
MFD_INACCESSIBLE | MFD_HUGETLB should route this way. There should be zero
implications for shmem. It is going to be separate struct memfile_backing_store.

I'm not sure secretmem is a fit here as we want to extend MFD_INACCESSIBLE
to be movable if platform supports it and secretmem is not migratable by
design (without direct mapping fragmentations).
But secretmem _could_ be a fit.  If a use case wants to unmap guest private memory
from both userspace and the kernel then KVM should absolutely be able to support
that, but at the same time I don't want to have to update KVM to enable secretmem
(and I definitely don't want KVM poking into the directmap itself).

MFD_INACCESSIBLE should only say "this memory can't be mapped into userspace",
any other properties should be completely separate, e.g. the inability to migrate
pages is effective a restriction from KVM (acting on behalf of TDX/SNP), it's not
a fundamental property of MFD_INACCESSIBLE.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help