Thread (35 messages) 35 messages, 5 authors, 2020-07-28

Re: [RFC PATCH v2 21/21] netgpu/nvidia: add Nvidia plugin for netgpu

From: "Chris Mason" <clm@fb.com>
Date: 2020-07-28 17:18:58

On 28 Jul 2020, at 12:31, Greg KH wrote:
On Mon, Jul 27, 2020 at 03:44:44PM -0700, Jonathan Lemon wrote:
quoted
From: Jonathan Lemon <redacted>

This provides the interface between the netgpu core module and the
nvidia kernel driver.  This should be built as an external module,
pointing to the nvidia build.  For example:

export NV_PACKAGE_DIR=/w/nvidia/NVIDIA-Linux-x86_64-440.64
make -C ${kdir} M=`pwd` O=obj $*
Ok, now you are just trolling us.

Nice job, I shouldn't have read the previous patches.

Please, go get a lawyer to sign-off on this patch, with their 
corporate
email address on it.  That's the only way we could possibly consider
something like this.

Oh, and we need you to use your corporate email address too, as you 
are
not putting copyright notices on this code, we will need to know who 
to
come after in the future.
Jonathan, I think we need to do a better job talking about patches that 
are just meant to enable possible users vs patches that we actually hope 
the upstream kernel to take.  Obviously code that only supports out of 
tree drivers isn’t a good fit for the upstream kernel.  From the point 
of view of experimenting with these patches, GPUs benefit a lot from 
this functionality so I think it does make sense to have the enabling 
patches somewhere, just not in this series.

We’re finding it more common to have pcie switch hops between a [ GPU, 
NIC ] pair and the CPU, which gives a huge advantage to out of tree 
drivers or extensions that can DMA directly between the GPU/NIC without 
having to copy through the CPU.  I’d love to have an alternative built 
on TCP because that’s where we invest the vast majority of our tuning, 
security and interoperability testing.  It’s just more predictable 
overall.

This isn’t a new story, but if we can layer on APIs that enable this 
cleanly for in-tree drivers, we can work with the vendors to use better 
supported APIs and have a more stable kernel.  Obviously this is an RFC 
and there’s a long road ahead, but as long as the upstream kernel 
doesn’t provide an answer, out of tree drivers are going to fill in 
the weak spots.

Other possible use cases would include also include other GPUs or my 
favorite:

NVME <-> filesystem <-> NIC with io_uring driving the IO and without 
copies.

-chris
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help