Thread (43 messages) 43 messages, 7 authors, 2021-08-20

Re: [PATCH v6 05/13] drm/amdkfd: generic type as sys mem on migration to ram

From: Felix Kuehling <felix.kuehling@amd.com>
Date: 2021-08-17 00:42:48
Also in: amd-gfx, dri-devel, linux-mm, linux-xfs

Am 2021-08-16 um 6:06 p.m. schrieb Zeng, Oak:
Regards,
Oak 

 

On 2021-08-16, 3:53 PM, "amd-gfx on behalf of Sierra Guiza, Alejandro (Alex)" <amd-gfx-bounces@lists.freedesktop.org on behalf of alex.sierra@amd.com> wrote:


    On 8/15/2021 10:38 AM, Christoph Hellwig wrote:
    > On Fri, Aug 13, 2021 at 01:31:42AM -0500, Alex Sierra wrote:
    >>   	migrate.vma = vma;
    >>   	migrate.start = start;
    >>   	migrate.end = end;
    >> -	migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
    >>   	migrate.pgmap_owner = SVM_ADEV_PGMAP_OWNER(adev);
    >>   
    >> +	if (adev->gmc.xgmi.connected_to_cpu)
    >> +		migrate.flags = MIGRATE_VMA_SELECT_SYSTEM;
    >> +	else
    >> +		migrate.flags = MIGRATE_VMA_SELECT_DEVICE_PRIVATE;
    > It's been a while since I touched this migrate code, but doesn't this
    > mean that if the range already contains system memory the migration
    > now won't do anything? for the connected_to_cpu case?

    For above’s condition equal to connected_to_cpu , we’re explicitly 
    migrating from
    device memory to system memory with device generic type. 

For MEMORY_DEVICE_GENERIC memory type, why do we need to explicitly migrate it from device memory to normal system memory? I thought the design was, for this type of memory, CPU can access it in place without migration(just like CPU access normal system memory), so there is no need to migrate such type of memory to normal system memory...

With this patch, the migration behavior will be: when memory is accessed by CPU, it will be migrated to normal system memory; when memory is accessed by GPU, it will be migrated to device vram. This is basically the same behavior as when vram is treated as DEVICE_PRIVATE. 

I thought the whole goal of introducing DEVICE_GENERIC is to avoid such back and forth migration b/t device memory and normal system memory. But maybe I am missing something here....
Hi Oak,

By using MEMORY_DEVICE_GENERIC we can avoid CPU page faults triggering
migration back to system memory on every CPU access on the Frontier
system architecture, because such pages can be mapped in the CPU page
table. You're right that this is the reason for the whole patch series.

But we still need the ability to migrate from MEMORY_DEVICE_GENERIC to
system memory for reasons other than CPU page faults. Applications can
request migrations explicitly (hipMemPrefetchAsync). Or we can be forced
to migrate data due to memory pressure from other allocations (evictions
in the TTM memory allocator).

Regards,
  Felix

Regards,
Oak

In this type, 
    device PTEs are
    present in CPU page table.

    During migrate_vma_collect_pmd walk op at migrate_vma_setup call, 
    there’s a condition
    for present pte that require migrate->flags be set for 
    MIGRATE_VMA_SELECT_SYSTEM.
    Otherwise, the migration for this entry will be ignored.

    Regards,
    Alex S.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help