Thread (23 messages) 23 messages, 6 authors, 2022-01-03

Re: [PATCH v2 03/11] mm/gup: migrate PIN_LONGTERM dev coherent pages to system

From: Jason Gunthorpe <jgg@ziepe.ca>
Date: 2021-12-08 13:54:09
Also in: amd-gfx, dri-devel, linux-mm, linux-xfs

On Wed, Dec 08, 2021 at 10:31:58PM +1100, Alistair Popple wrote:
On Tuesday, 7 December 2021 5:52:43 AM AEDT Alex Sierra wrote:
quoted
Avoid long term pinning for Coherent device type pages. This could
interfere with their own device memory manager.
If caller tries to get user device coherent pages with PIN_LONGTERM flag
set, those pages will be migrated back to system memory.

Signed-off-by: Alex Sierra <redacted>
 mm/gup.c | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 886d6148d3d0..1572eacf07f4 100644
+++ b/mm/gup.c
@@ -1689,17 +1689,37 @@ struct page *get_dump_page(unsigned long addr)
 #endif /* CONFIG_ELF_CORE */
 
 #ifdef CONFIG_MIGRATION
+static int migrate_device_page(unsigned long address,
+				struct page *page)
+{
+	struct vm_area_struct *vma = find_vma(current->mm, address);
+	struct vm_fault vmf = {
+		.vma = vma,
+		.address = address & PAGE_MASK,
+		.flags = FAULT_FLAG_USER,
+		.pgoff = linear_page_index(vma, address),
+		.gfp_mask = GFP_KERNEL,
+		.page = page,
+	};
+	if (page->pgmap && page->pgmap->ops->migrate_to_ram)
+		return page->pgmap->ops->migrate_to_ram(&vmf);
How does this synchronise against pgmap being released? As I understand things
at this point we're not holding a reference on either the page or pgmap, so
the page and therefore the pgmap may have been freed.
For sure, this can't keep touching the pages[] array after it unpinned
them:
quoted
 	if (gup_flags & FOLL_PIN) {
 		unpin_user_pages(pages, nr_pages);
               ^^^^^^^^^^^^^^^^^^^
quoted
 	} else {
 		for (i = 0; i < nr_pages; i++)
 			put_page(pages[i]);
 	}
+	if (is_device_page(head))
+		return migrate_device_page(start + page_index * PAGE_SIZE, head);
It was safe before this patch as isolate_lru_page(head) has a
get_page() inside.

Also, please try hard not to turn this function into goto spaghetti
I think a similar problem exists for device private fault handling as well and
it has been on my list of things to fix for a while. I think the solution is to
call try_get_page(), except it doesn't work with device pages due to the whole
refcount thing. That issue is blocking a fair bit of work now so I've started
looking into it.
Where is this?

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help