Thread (39 messages) 39 messages, 9 authors, 2021-06-04

Re: [RFCv2 13/13] KVM: unmap guest memory using poisoned pages

From: Sean Christopherson <seanjc@google.com>
Date: 2021-06-03 19:48:17
Also in: linux-mm, lkml

On Thu, Jun 03, 2021, Kirill A. Shutemov wrote:
On Wed, Jun 02, 2021 at 05:51:02PM +0000, Sean Christopherson wrote:
quoted
quoted
Omitting FOLL_GUEST for shared memory doesn't look like a right approach.
IIUC, it would require the kernel to track what memory is share and what
private, which defeat the purpose of the rework. I would rather enforce
!PageGuest() when share SEPT is populated in addition to enforcing
PageGuest() fro private SEPT.
Isn't that what omitting FOLL_GUEST would accomplish?  For shared memory,
including mapping memory into the shared EPT, KVM will omit FOLL_GUEST and thus
require the memory to be readable/writable according to the guest access type.
Ah. I guess I see what you're saying: we can pipe down the shared bit from
GPA from direct_page_fault() (or whatever handles the fault) down to
hva_to_pfn_slow() and omit FOLL_GUEST if the shared bit is set. Right?
Yep.
I guest it's doable, but codeshuffling going to be ugly.
It shouldn't be too horrific.  If it is horrific, I'd be more than happy to
refactor the flow before hand to collect the hva_to_pfn() params into a struct
so that adding a "private" flag is less painful.  There is already TDX-related
work to do similar cleanup in the x86-specific code.

https://lkml.kernel.org/r/cover.1618914692.git.isaku.yamahata@intel.com
quoted
By definition, that excludes PageGuest() because PageGuest() pages must always
be unmapped, e.g. PROTNONE.  And for private EPT, because PageGuest() is always
PROTNONE or whatever, it will require FOLL_GUEST to retrieve the PTE/PMD/Pxx.

On a semi-related topic, I don't think can_follow_write_pte() is the correct
place to hook PageGuest().  TDX's S-EPT has a quirk where all private guest
memory must be mapped writable, but that quirk doesn't hold true for non-TDX
guests.  It should be legal to map private guest memory as read-only.
Hm. The point of the change in can_follow_write_pte() is to only allow to
write to a PageGuest() page if FOLL_GUEST is used and the mapping is
writable. Without the change gup(FOLL_GUEST|FOLL_WRITE) would fail.

It doesn't prevent using read-only guest mappings as read-only. But if you
want to write to it it has to writable (in addtion to FOLL_GUEST). 
100% agree that the page needs to be host-writable to be mapped as writable.
What I was pointing out is that if FOLL_WRITE is not set, gup() will never check
the PageGuest() exemption (moot point until the protnone check is fixed), and
more importantly that the FOLL_GUEST check is orthogonal to the FOLL_WRITE check.

In other words, I would expect the code to look something ike:

	if (PageGuest()) {
		if (!(flags & FOLL_GUEST)) {
			pte_unmap_unlock(ptep, ptl);
			return NULL;
		}
	} else if ((flags & FOLL_NUMA) && pte_protnone(pte)) {
		goto no_page;
	}
quoted
And I believe the below snippet in follow_page_pte() will be problematic
too, since FOLL_NUMA is added unless FOLL_FORCE is set.  I suspect the
correct approach is to handle FOLL_GUEST as an exception to
pte_protnone(), though that might require adjusting pte_protnone() to be
meaningful even when CONFIG_NUMA_BALANCING=n.

	if ((flags & FOLL_NUMA) && pte_protnone(pte))
		goto no_page;
	if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) {
		pte_unmap_unlock(ptep, ptl);
		return NULL;
	}
Good catch. I'll look into how to untangle NUMA balancing and PageGuest().
It shouldn't be hard. PageGuest() pages should be subject for balancing.
quoted
quoted
Do you see any problems with this?
quoted
Oh, and the other nicety is that I think it would avoid having to explicitly
handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all
memory exposed to KVM must be !PageGuest(), then it is also eligible for
copy_{to,from}_user().
copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE.
But KVM does _not_ want those PTEs PROT_NONE.  If KVM is accessing memory that
is also accessible by the the guest, then it must be shared.  And if it's shared,
it must also be accessible to host userspace, i.e. something other than PROT_NONE,
otherwise the memory isn't actually shared with anything.

As above, any guest-accessible memory that is accessed by the host must be
shared, and so must be mapped with the required permissions.
I don't see contradiction here: copy_{to,from}_user() would fail with
-EFAULT on PROT_NONE PTE.

By saying in initial posting that inserting PageGuest() into shared is
fine, I didn't mean it's usefule, just allowed.
Yeah, and I'm saying we should explicitly disallow mapping PageGuest() into
shared memory, and then the KVM code that manually kmaps() PageGuest() memory
to avoid copy_{to,from}_user() failure goes aways.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help