Thread (27 messages) 27 messages, 6 authors, 2021-02-17

Re: [PATCH 0/9] Add support for SVM atomics in Nouveau

From: Daniel Vetter <hidden>
Date: 2021-02-09 13:45:43
Also in: dri-devel, linux-doc, lkml, nouveau

On Tue, Feb 9, 2021 at 2:35 PM Jason Gunthorpe [off-list ref] wrote:
On Tue, Feb 09, 2021 at 11:57:28PM +1100, Alistair Popple wrote:
quoted
On Tuesday, 9 February 2021 9:27:05 PM AEDT Daniel Vetter wrote:
quoted
quoted
Recent changes to pin_user_pages() prevent the creation of pinned pages in
ZONE_MOVABLE. This series allows pinned pages to be created in
ZONE_MOVABLE
quoted
quoted
as attempts to migrate may fail which would be fatal to userspace.

In this case migration of the pinned page is unnecessary as the page can
be
quoted
quoted
unpinned at anytime by having the driver revoke atomic permission as it
does for the migrate_to_ram() callback. However a method of calling this
when memory needs to be moved has yet to be resolved so any discussion is
welcome.
Why do we need to pin for gpu atomics? You still have the callback for
cpu faults, so you
can move the page as needed, and hence a long-term pin sounds like the
wrong approach.
Technically a real long term unmoveable pin isn't required, because as you say
the page can be moved as needed at any time. However I needed some way of
stopping the CPU page from being freed once the userspace mappings for it had
been removed.
The issue is you took the page out of the PTE it belongs to, which
makes it orphaned and unlocatable by the rest of the mm?

Ideally this would leave the PTE in place so everything continues to
work, just disable CPU access to it.

Maybe some kind of special swap entry?
I probably should have read the patches more in detail, I was assuming
the ZONE_DEVICE is only for vram. At least I thought the requirement
for gpu atomics was that the page is in vram, but maybe I'm mixing up
how this works on nvidia with how it works in other places. Iirc we
had a long discussion about this at lpc19 that ended with the
conclusion that we must be able to migrate, and sometimes migration is
blocked. But the details ellude me now.

Either way ZONE_DEVICE for not vram/device memory sounds wrong. Is
that really going on here?
-Daniel
I also don't much like the use of ZONE_DEVICE here, that should only
be used for actual device memory, not as a temporary proxy for CPU
pages.. Having two struct pages refer to the same physical memory is
pretty ugly.
quoted
The normal solution of registering an MMU notifier to unpin the page when it
needs to be moved also doesn't work as the CPU page tables now point to the
device-private page and hence the migration code won't call any invalidate
notifiers for the CPU page.
The fact the page is lost from the MM seems to be the main issue here.
quoted
Yes, I would like to avoid the long term pin constraints as well if possible I
just haven't found a solution yet. Are you suggesting it might be possible to
add a callback in the page migration logic to specially deal with moving these
pages?
How would migration even find the page?

Jason


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help