Thread (33 messages) 33 messages, 10 authors, 2021-09-02

Re: [RFC] Make use of non-dynamic dmabuf in RDMA

From: Jason Gunthorpe <jgg@ziepe.ca>
Date: 2021-08-24 19:30:57
Also in: dri-devel, linux-media, lkml

On Wed, Aug 25, 2021 at 05:15:52AM +1000, Dave Airlie wrote:
On Wed, 25 Aug 2021 at 03:36, John Hubbard [off-list ref] wrote:
quoted
On 8/24/21 10:32 AM, Jason Gunthorpe wrote:
...
quoted
quoted
quoted
And yes at least for the amdgpu driver we migrate the memory to host
memory as soon as it is pinned and I would expect that other GPU drivers
do something similar.
Well...for many topologies, migrating to host memory will result in a
dramatically slower p2p setup. For that reason, some GPU drivers may
want to allow pinning of video memory in some situations.

Ideally, you've got modern ODP devices and you don't even need to pin.
But if not, and you still hope to do high performance p2p between a GPU
and a non-ODP Infiniband device, then you would need to leave the pinned
memory in vidmem.

So I think we don't want to rule out that behavior, right? Or is the
thinking more like, "you're lucky that this old non-ODP setup works at
all, and we'll make it work by routing through host/cpu memory, but it
will be slow"?
I think it depends on the user, if the user creates memory which is
permanently located on the GPU then it should be pinnable in this way
without force migration. But if the memory is inherently migratable
then it just cannot be pinned in the GPU at all as we can't
indefinately block migration from happening eg if the CPU touches it
later or something.
OK. I just want to avoid creating any API-level assumptions that dma_buf_pin()
necessarily implies or requires migrating to host memory.
I'm not sure we should be allowing dma_buf_pin at all on
non-migratable memory, what's to stop someone just pinning all the
VRAM and making the GPU unuseable?
IMHO the same thinking that prevents pining all of system ram and
making the system unusable? GPU isn't so special here. The main
restriction is the pinned memory ulimit. For most out-of-the-box cases
this is set to something like 64k

For the single-user HPC use cases it is made unlimited.
My impression from this is we've designed hardware that didn't
consider the problem, and now to let us use that hardware in horrible
ways we should just allow it to pin all the things.
It is more complex than that, HW that can support dynamic memory under
*everything* is complicated (and in some cases slow!). As there is
only a weak rational to do this, we don't see it in often in the
market.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help