Thread (153 messages) 153 messages, 23 authors, 2023-05-23

Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

From: Sean Christopherson <seanjc@google.com>
Date: 2022-12-22 18:15:37
Also in: kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Wed, Dec 21, 2022, Chao Peng wrote:
On Tue, Dec 20, 2022 at 08:33:05AM +0000, Huang, Kai wrote:
quoted
On Tue, 2022-12-20 at 15:22 +0800, Chao Peng wrote:
quoted
On Mon, Dec 19, 2022 at 08:48:10AM +0000, Huang, Kai wrote:
quoted
On Mon, 2022-12-19 at 15:53 +0800, Chao Peng wrote:
But for non-restricted-mem case, it is correct for KVM to decrease page's
refcount after setting up mapping in the secondary mmu, otherwise the page will
be pinned by KVM for normal VM (since KVM uses GUP to get the page).
That's true. Actually even true for restrictedmem case, most likely we
will still need the kvm_release_pfn_clean() for KVM generic code. On one
side, other restrictedmem users like pKVM may not require page pinning
at all. On the other side, see below.
quoted
So what we are expecting is: for KVM if the page comes from restricted mem, then
KVM cannot decrease the refcount, otherwise for normal page via GUP KVM should.
No, requiring the user (KVM) to guard against lack of support for page migration
in restricted mem is a terrible API.  It's totally fine for restricted mem to not
support page migration until there's a use case, but punting the problem to KVM
is not acceptable.  Restricted mem itself doesn't yet support page migration,
e.g. explosions would occur even if KVM wanted to allow migration since there is
no notification to invalidate existing mappings.
I argue that this page pinning (or page migration prevention) is not
tied to where the page comes from, instead related to how the page will
be used. Whether the page is restrictedmem backed or GUP() backed, once
it's used by current version of TDX then the page pinning is needed. So
such page migration prevention is really TDX thing, even not KVM generic
thing (that's why I think we don't need change the existing logic of
kvm_release_pfn_clean()). Wouldn't better to let TDX code (or who
requires that) to increase/decrease the refcount when it populates/drops
the secure EPT entries? This is exactly what the current TDX code does:
I agree that whether or not migration is supported should be controllable by the
user, but I strongly disagree on punting refcount management to KVM (or TDX).
The whole point of restricted mem is to support technologies like TDX and SNP,
accomodating their special needs for things like page migration should be part of
the API, not some footnote in the documenation.

It's not difficult to let the user communicate support for page migration, e.g.
if/when restricted mem gains support, add a hook to restrictedmem_notifier_ops
to signal support (or lack thereof) for page migration.  NULL == no migration,
non-NULL == migration allowed.

We know that supporting page migration in TDX and SNP is possible, and we know
that page migration will require a dedicated API since the backing store can't
memcpy() the page.  I don't see any reason to ignore that eventuality.

But again, unless I'm missing something, that's a future problem because restricted
mem doesn't yet support page migration regardless of the downstream user.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help