Thread (101 messages) 101 messages, 16 authors, 2022-12-02

Re: [PATCH v9 1/8] mm: Introduce memfd_restricted system call to create restricted user memory

From: Kirill A. Shutemov <hidden>
Date: 2022-11-14 15:29:16
Also in: kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel

On Mon, Nov 14, 2022 at 03:02:37PM +0100, Vlastimil Babka wrote:
On 11/1/22 16:19, Michael Roth wrote:
quoted
On Tue, Nov 01, 2022 at 07:37:29PM +0800, Chao Peng wrote:
quoted
quoted
  1) restoring kernel directmap:

     Currently SNP (and I believe TDX) need to either split or remove kernel
     direct mappings for restricted PFNs, since there is no guarantee that
     other PFNs within a 2MB range won't be used for non-restricted
     (which will cause an RMP #PF in the case of SNP since the 2MB
     mapping overlaps with guest-owned pages)
Has the splitting and restoring been a well-discussed direction? I'm
just curious whether there is other options to solve this issue.
For SNP it's been discussed for quite some time, and either splitting or
removing private entries from directmap are the well-discussed way I'm
aware of to avoid RMP violations due to some other kernel process using
a 2MB mapping to access shared memory if there are private pages that
happen to be within that range.

In both cases the issue of how to restore directmap as 2M becomes a
problem.

I was also under the impression TDX had similar requirements. If so,
do you know what the plan is for handling this for TDX?

There are also 2 potential alternatives I'm aware of, but these haven't
been discussed in much detail AFAIK:

a) Ensure confidential guests are backed by 2MB pages. shmem has a way to
   request 2MB THP pages, but I'm not sure how reliably we can guarantee
   that enough THPs are available, so if we went that route we'd probably
   be better off requiring the use of hugetlbfs as the backing store. But
   obviously that's a bit limiting and it would be nice to have the option
   of using normal pages as well. One nice thing with invalidation
   scheme proposed here is that this would "Just Work" if implement
   hugetlbfs support, so an admin that doesn't want any directmap
   splitting has this option available, otherwise it's done as a
   best-effort.

b) Implement general support for restoring directmap as 2M even when
   subpages might be in use by other kernel threads. This would be the
   most flexible approach since it requires no special handling during
   invalidations, but I think it's only possible if all the CPA
   attributes for the 2M range are the same at the time the mapping is
   restored/unsplit, so some potential locking issues there and still
   chance for splitting directmap over time.
I've been hoping that

c) using a mechanism such as [1] [2] where the goal is to group together
these small allocations that need to increase directmap granularity so
maximum number of large mappings are preserved.
As I mentioned in the other thread the restricted memfd can be backed by
secretmem instead of plain memfd. It already handles directmap with care.

But I don't think it has to be part of initial restricted memfd
implementation. It is SEV-specific requirement and AMD folks can extend
implementation as needed later.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help