Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd

[PATCH v8 0/8] KVM: mm: fd-based approach for supporting KVM · Chao Peng <hidden> · 2022-09-15
[PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · David Hildenbrand <hidden> · 2022-09-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-09-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · "Andy Lutomirski" <luto@kernel.org> · 2022-09-21
RE: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Wang, Wei W <hidden> · 2022-09-22
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-23
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-23
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-09-26
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-26
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-09-27
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-30
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-10-13
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-10-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-10-18
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-09-23
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · David Hildenbrand <hidden> · 2022-09-26
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A. Shutemov <hidden> · 2022-09-26
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · David Hildenbrand <hidden> · 2022-09-26
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-09-27
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A. Shutemov <hidden> · 2022-09-28
RE: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Wang, Wei W <hidden> · 2022-09-22
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-09-22
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-09-23
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-23
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-09-30
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-09-30
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-03
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A. Shutemov <hidden> · 2022-10-03
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-04
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Fuad Tabba <hidden> · 2022-10-06
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A. Shutemov <hidden> · 2022-10-06
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Vlastimil Babka <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Gupta, Pankaj <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-10-17
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Vishal Annapurve <hidden> · 2022-10-18
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-10-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Vishal Annapurve <hidden> · 2022-10-20
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-10-21
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-10-21
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Vishal Annapurve <hidden> · 2022-10-19
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Chao Peng <hidden> · 2022-10-21
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Sean Christopherson <seanjc@google.com> · 2022-10-21
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Kirill A . Shutemov <hidden> · 2022-10-24
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · David Hildenbrand <hidden> · 2022-10-24
Re: [PATCH v8 1/8] mm/memfd: Introduce userspace inaccessible memfd · Vishal Annapurve <hidden> · 2022-11-03
[PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Bagas Sanjaya <hidden> · 2022-09-16
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-09-16
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Fuad Tabba <hidden> · 2022-09-26
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-09-26
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Isaku Yamahata <hidden> · 2022-09-29
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-09-29
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-05
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-05
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Fuad Tabba <hidden> · 2022-10-06
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-06
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-06
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-10-06
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-07
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Sean Christopherson <seanjc@google.com> · 2022-10-07
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-07
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-08
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-08
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Chao Peng <hidden> · 2022-10-10
Re: [PATCH v8 2/8] KVM: Extend the memslot to support fd-based private memory · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-12
[PATCH v8 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Bagas Sanjaya <hidden> · 2022-09-16
Re: [PATCH v8 3/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit · Chao Peng <hidden> · 2022-09-16
[PATCH v8 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry · Chao Peng <hidden> · 2022-09-15
[PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-09-26
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-09-26
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-11
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-10-12
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-17
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-10-17
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Chao Peng <hidden> · 2022-10-19
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-19
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Sean Christopherson <seanjc@google.com> · 2022-10-19
Re: [PATCH v8 5/8] KVM: Register/unregister the guest private memory regions · Fuad Tabba <hidden> · 2022-10-19
[PATCH v8 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 6/8] KVM: Update lpage info when private/shared memory are mixed · Isaku Yamahata <hidden> · 2022-09-29
Re: [PATCH v8 6/8] KVM: Update lpage info when private/shared memory are mixed · Chao Peng <hidden> · 2022-09-30
[PATCH v8 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 7/8] KVM: Handle page fault for private memory · Sean Christopherson <seanjc@google.com> · 2022-10-14
Re: [PATCH v8 7/8] KVM: Handle page fault for private memory · Chao Peng <hidden> · 2022-10-17
[PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-09-15
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Jarkko Sakkinen <jarkko@kernel.org> · 2022-10-04
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-10-10
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Fuad Tabba <hidden> · 2022-10-06
Re: [PATCH v8 8/8] KVM: Enable and expose KVM_MEM_PRIVATE · Chao Peng <hidden> · 2022-10-10

From: Chao Peng <hidden>
Date: 2022-09-26 15:45:05
Also in: kvm, linux-api, linux-fsdevel, linux-mm, lkml, qemu-devel

On Fri, Sep 23, 2022 at 04:19:46PM +0100, Fuad Tabba wrote:

quoted

Regarding pKVM's use case, with the shim approach I believe this can be done by
allowing userspace mmap() the "hidden" memfd, but with a ton of restrictions
piled on top.

My first thought was to make the uAPI a set of KVM ioctls so that KVM could tightly
tightly control usage without taking on too much complexity in the kernel, but
working through things, routing the behavior through the shim itself might not be
all that horrific.

IIRC, we discarded the idea of allowing userspace to map the "private" fd because
things got too complex, but with the shim it doesn't seem _that_ bad.

E.g. on the memfd side:

  1. The entire memfd must be mapped, and at most one mapping is allowed, i.e.
     mapping is all or nothing.

  2. Acquiring a reference via get_pfn() is disallowed if there's a mapping for
     the restricted memfd.

  3. Add notifier hooks to allow downstream users to further restrict things.

  4. Disallow splitting VMAs, e.g. to force userspace to munmap() everything in
     one shot.

  5. Require that there are no outstanding references at munmap().  Or if this
     can't be guaranteed by userspace, maybe add some way for userspace to wait
     until it's ok to convert to private?  E.g. so that get_pfn() doesn't need
     to do an expensive check every time.

  static int memfd_restricted_mmap(struct file *file, struct vm_area_struct *vma)
  {
        if (vma->vm_pgoff)
                return -EINVAL;

        if ((vma->vm_end - vma->vm_start) != <file size>)
                return -EINVAL;

        mutex_lock(&data->lock);

        if (data->has_mapping) {
                r = -EINVAL;
                goto err;
        }
        list_for_each_entry(notifier, &data->notifiers, list) {
                r = notifier->ops->mmap_start(notifier, ...);
                if (r)
                        goto abort;
        }

        notifier->ops->mmap_end(notifier, ...);
        mutex_unlock(&data->lock);
        return 0;

  abort:
        list_for_each_entry_continue_reverse(notifier &data->notifiers, list)
                notifier->ops->mmap_abort(notifier, ...);
  err:
        mutex_unlock(&data->lock);
        return r;
  }

  static void memfd_restricted_close(struct vm_area_struct *vma)
  {
        mutex_lock(...);

        /*
         * Destroy the memfd and disable all future accesses if there are
         * outstanding refcounts (or other unsatisfied restrictions?).
         */
        if (<outstanding references> || ???)
                memfd_restricted_destroy(...);
        else
                data->has_mapping = false;

        mutex_unlock(...);
  }

  static int memfd_restricted_may_split(struct vm_area_struct *area, unsigned long addr)
  {
        return -EINVAL;
  }

  static int memfd_restricted_mapping_mremap(struct vm_area_struct *new_vma)
  {
        return -EINVAL;
  }

Then on the KVM side, its mmap_start() + mmap_end() sequence would:

  1. Not be supported for TDX or SEV-SNP because they don't allow adding non-zero
     memory into the guest (after pre-boot phase).

  2. Be mutually exclusive with shared<=>private conversions, and is allowed if
     and only if the entire gfn range of the associated memslot is shared.

In general I think that this would work with pKVM. However, limiting
private<->shared conversions to the granularity of a whole memslot
might be difficult to handle in pKVM, since the guest doesn't have the
concept of memslots. For example, in pKVM right now, when a guest
shares back its restricted DMA pool with the host it does so at the
page-level. pKVM would also need a way to make an fd accessible again
when shared back, which I think isn't possible with this patch.

But does pKVM really want to mmap/munmap a new region at the page-level,
that can cause VMA fragmentation if the conversion is frequent as I see.
Even with a KVM ioctl for mapping as mentioned below, I think there will
be the same issue.

You were initially considering a KVM ioctl for mapping, which might be
better suited for this since KVM knows which pages are shared and
which ones are private. So routing things through KVM might simplify
things and allow it to enforce all the necessary restrictions (e.g.,
private memory cannot be mapped). What do you think?

Thanks,
/fuad

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help