Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory
From: Vlastimil Babka <hidden>
Date: 2023-01-24 07:53:43
Also in:
kvm, linux-arch, linux-doc, linux-fsdevel, linux-mm, lkml, qemu-devel
On 1/24/23 00:38, Sean Christopherson wrote:
On Mon, Jan 23, 2023, Huang, Kai wrote:quoted
On Mon, 2023-01-23 at 15:03 +0100, Vlastimil Babka wrote:quoted
On 12/22/22 01:37, Huang, Kai wrote:quoted
quoted
quoted
I argue that this page pinning (or page migration prevention) is not tied to where the page comes from, instead related to how the page will be used. Whether the page is restrictedmem backed or GUP() backed, once it's used by current version of TDX then the page pinning is needed. So such page migration prevention is really TDX thing, even not KVM generic thing (that's why I think we don't need change the existing logic of kvm_release_pfn_clean()).This essentially boils down to who "owns" page migration handling, and sadly, page migration is kinda "owned" by the core-kernel, i.e. KVM cannot handle page migration by itself -- it's just a passive receiver. For normal pages, page migration is totally done by the core-kernel (i.e. it unmaps page from VMA, allocates a new page, and uses migrate_pape() or a_ops-quoted
migrate_page() to actually migrate the page).In the sense of TDX, conceptually it should be done in the same way. The more important thing is: yes KVM can use get_page() to prevent page migration, but when KVM wants to support it, KVM cannot just remove get_page(), as the core- kernel will still just do migrate_page() which won't work for TDX (given restricted_memfd doesn't have a_ops->migrate_page() implemented). So I think the restricted_memfd filesystem should own page migration handling, (i.e. by implementing a_ops->migrate_page() to either just reject page migration or somehow support it).While this thread seems to be settled on refcounts already,I am not sure but will let Sean/Paolo to decide.My preference is whatever is most performant without being hideous :-)quoted
quoted
just wanted to point out that it wouldn't be ideal to prevent migrations by a_ops->migrate_page() rejecting them. It would mean cputime wasted (i.e. by memory compaction) by isolating the pages for migration and then releasing them after the callback rejects it (at least we wouldn't waste time creating and undoing migration entries in the userspace page tables as there's no mmap). Elevated refcount on the other hand is detected very early in compaction so no isolation is attempted, so from that aspect it's optimal.I am probably missing something,Heh, me too, I could have sworn that using refcounts was the least efficient way to block migration.
Well I admit that due to my experience with it, I do mostly consider
migration through memory compaction POV, which is a significant user of
migration on random pages that's not requested by userspace actions on
specific ranges.
And compaction has in isolate_migratepages_block():
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
if (!mapping && page_count(page) > page_mapcount(page))
goto isolate_fail;
so that prevents migration of pages with elevated refcount very early,
before they are even isolated, so before migrate_pages() is called.
But it's true there are other sources of "random pages migration" - numa
balancing, demotion in lieu of reclaim... and I'm not sure if all have
such early check too.
Anyway, whatever is decided to be a better way than elevated refcounts,
would ideally be checked before isolation as well, as that's the most
efficient way.
quoted
but IIUC the checking of refcount happens at very last stage of page migration too