Re: [PATCH 6/7] nouveau/dmem: Evict device private memory during release

From: Alistair Popple <apopple@nvidia.com>
Date: 2022-09-27 01:52:10
Also in: amd-gfx, dri-devel, linux-mm, lkml, nouveau

Felix Kuehling [off-list ref] writes:

On 2022-09-26 17:35, Lyude Paul wrote:

quoted

On Mon, 2022-09-26 at 16:03 +1000, Alistair Popple wrote:

quoted

When the module is unloaded or a GPU is unbound from the module it is
possible for device private pages to be left mapped in currently running
processes. This leads to a kernel crash when the pages are either freed
or accessed from the CPU because the GPU and associated data structures
and callbacks have all been freed.

Fix this by migrating any mappings back to normal CPU memory prior to
freeing the GPU memory chunks and associated device private pages.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

I assume the AMD driver might have a similar issue. However I can't see
where device private (or coherent) pages actually get unmapped/freed
during teardown as I couldn't find any relevant calls to
devm_memunmap(), memunmap(), devm_release_mem_region() or
release_mem_region(). So it appears that ZONE_DEVICE pages are not being
properly freed during module unload, unless I'm missing something?

I've got no idea, will poke Ben to see if they know the answer to this

I guess we're relying on devm to release the region. Isn't the whole point of
using devm_request_free_mem_region that we don't have to remember to explicitly
release it when the device gets destroyed? I believe we had an explicit free
call at some point by mistake, and that caused a double-free during module
unload. See this commit for reference:

Argh, thanks for that pointer. I was not so familiar with
devm_request_free_mem_region()/devm_memremap_pages() as currently
Nouveau explicitly manages that itself.

commit 22f4f4faf337d5fb2d2750aff13215726814273e
Author: Philip Yang [off-list ref]
Date:   Mon Sep 20 17:25:52 2021 -0400

    drm/amdkfd: fix svm_migrate_fini warning
         Device manager releases device-specific resources when a driver
    disconnects from a device, devm_memunmap_pages and
    devm_release_mem_region calls in svm_migrate_fini are redundant.
         It causes below warning trace after patch "drm/amdgpu: Split
    amdgpu_device_fini into early and late", so remove function
    svm_migrate_fini.
         BUG: https://gitlab.freedesktop.org/drm/amd/-/issues/1718
         WARNING: CPU: 1 PID: 3646 at drivers/base/devres.c:795
    devm_release_action+0x51/0x60
    Call Trace:
        ? memunmap_pages+0x360/0x360
        svm_migrate_fini+0x2d/0x60 [amdgpu]
        kgd2kfd_device_exit+0x23/0xa0 [amdgpu]
        amdgpu_amdkfd_device_fini_sw+0x1d/0x30 [amdgpu]
        amdgpu_device_fini_sw+0x45/0x290 [amdgpu]
        amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
        drm_dev_release+0x20/0x40 [drm]
        release_nodes+0x196/0x1e0
        device_release_driver_internal+0x104/0x1d0
        driver_detach+0x47/0x90
        bus_remove_driver+0x7a/0xd0
        pci_unregister_driver+0x3d/0x90
        amdgpu_exit+0x11/0x20 [amdgpu]
         Signed-off-by: Philip Yang [off-list ref]
    Reviewed-by: Felix Kuehling [off-list ref]
    Signed-off-by: Alex Deucher [off-list ref]

Furthermore, I guess we are assuming that nobody is using the GPU when the
module is unloaded. As long as any processes have /dev/kfd open, you won't be
able to unload the module (except by force-unload). I suppose with ZONE_DEVICE
memory, we can have references to device memory pages even when user mode has
closed /dev/kfd. We do have a cleanup handler that runs in an MMU-free-notifier.
In theory that should run after all the pages in the mm_struct have been freed.
It releases all sorts of other device resources and needs the driver to still be
there. I'm not sure if there is anything preventing a module unload before the
free-notifier runs. I'll look into that.

Right - module unload (or device unbind) is one of the other ways we can
hit this issue in Nouveau at least. You can end up with ZONE_DEVICE
pages mapped in a running process after the module has unloaded.
Although now you mention it that seems a bit wrong - the pgmap refcount
should provide some protection against that. Will have to look into
that too.

Regards,
  Felix

quoted

---
  drivers/gpu/drm/nouveau/nouveau_dmem.c | 48 +++++++++++++++++++++++++++-
  1 file changed, 48 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 66ebbd4..3b247b8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c

@@ -369,6 +369,52 @@ nouveau_dmem_suspend(struct nouveau_drm *drm)
  	mutex_unlock(&drm->dmem->mutex);
  }
  +/*
+ * Evict all pages mapping a chunk.
+ */
+void
+nouveau_dmem_evict_chunk(struct nouveau_dmem_chunk *chunk)
+{
+	unsigned long i, npages = range_len(&chunk->pagemap.range) >> PAGE_SHIFT;
+	unsigned long *src_pfns, *dst_pfns;
+	dma_addr_t *dma_addrs;
+	struct nouveau_fence *fence;
+
+	src_pfns = kcalloc(npages, sizeof(*src_pfns), GFP_KERNEL);
+	dst_pfns = kcalloc(npages, sizeof(*dst_pfns), GFP_KERNEL);
+	dma_addrs = kcalloc(npages, sizeof(*dma_addrs), GFP_KERNEL);
+
+	migrate_device_range(src_pfns, chunk->pagemap.range.start >> PAGE_SHIFT,
+			npages);
+
+	for (i = 0; i < npages; i++) {
+		if (src_pfns[i] & MIGRATE_PFN_MIGRATE) {
+			struct page *dpage;
+
+			/*
+			 * _GFP_NOFAIL because the GPU is going away and there
+			 * is nothing sensible we can do if we can't copy the
+			 * data back.
+			 */

You'll have to excuse me for a moment since this area of nouveau isn't one of
my strongpoints, but are we sure about this? IIRC __GFP_NOFAIL means infinite
retry, in the case of a GPU hotplug event I would assume we would rather just
stop trying to migrate things to the GPU and just drop the data instead of
hanging on infinite retries.

quoted

+			dpage = alloc_page(GFP_HIGHUSER | __GFP_NOFAIL);
+			dst_pfns[i] = migrate_pfn(page_to_pfn(dpage));
+			nouveau_dmem_copy_one(chunk->drm,
+					migrate_pfn_to_page(src_pfns[i]), dpage,
+					&dma_addrs[i]);
+		}
+	}
+
+	nouveau_fence_new(chunk->drm->dmem->migrate.chan, false, &fence);
+	migrate_device_pages(src_pfns, dst_pfns, npages);
+	nouveau_dmem_fence_done(&fence);
+	migrate_device_finalize(src_pfns, dst_pfns, npages);
+	kfree(src_pfns);
+	kfree(dst_pfns);
+	for (i = 0; i < npages; i++)
+		dma_unmap_page(chunk->drm->dev->dev, dma_addrs[i], PAGE_SIZE, DMA_BIDIRECTIONAL);
+	kfree(dma_addrs);
+}
+
  void
  nouveau_dmem_fini(struct nouveau_drm *drm)
  {

@@ -380,8 +426,10 @@ nouveau_dmem_fini(struct nouveau_drm *drm)
  	mutex_lock(&drm->dmem->mutex);
    	list_for_each_entry_safe(chunk, tmp, &drm->dmem->chunks, list) {
+		nouveau_dmem_evict_chunk(chunk);
  		nouveau_bo_unpin(chunk->bo);
  		nouveau_bo_ref(NULL, &chunk->bo);
+		WARN_ON(chunk->callocated);
  		list_del(&chunk->list);
  		memunmap_pages(&chunk->pagemap);
  		release_mem_region(chunk->pagemap.range.start,

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help