Thread (128 messages) 128 messages, 11 authors, 2021-11-08

Re: [dpdk-dev] [PATCH v3 0/9] GPU library

From: Jerin Jacob <hidden>
Date: 2021-10-11 13:30:52

On Mon, Oct 11, 2021 at 6:14 PM Thomas Monjalon [off-list ref] wrote:
11/10/2021 13:41, Jerin Jacob:
quoted
On Mon, Oct 11, 2021 at 3:57 PM Thomas Monjalon [off-list ref] wrote:
quoted
11/10/2021 11:29, Jerin Jacob:
quoted
On Mon, Oct 11, 2021 at 2:42 PM Thomas Monjalon [off-list ref] wrote:
quoted
11/10/2021 10:43, Jerin Jacob:
quoted
On Mon, Oct 11, 2021 at 1:48 PM Thomas Monjalon [off-list ref] wrote:
quoted
10/10/2021 12:16, Jerin Jacob:
quoted
On Fri, Oct 8, 2021 at 11:13 PM [off-list ref] wrote:
quoted
From: eagostini <redacted>

In heterogeneous computing system, processing is not only in the CPU.
Some tasks can be delegated to devices working in parallel.

The goal of this new library is to enhance the collaboration between
DPDK, that's primarily a CPU framework, and GPU devices.

When mixing network activity with task processing on a non-CPU device,
there may be the need to put in communication the CPU with the device
in order to manage the memory, synchronize operations, exchange info, etc..

This library provides a number of new features:
- Interoperability with GPU-specific library with generic handlers
- Possibility to allocate and free memory on the GPU
- Possibility to allocate and free memory on the CPU but visible from the GPU
- Communication functions to enhance the dialog between the CPU and the GPU
In the RFC thread, There was one outstanding non technical issues on this,

i.e
The above features are driver specific details. Does the DPDK
_application_ need to be aware of this?
I don't see these features as driver-specific.
That is the disconnect. I see this as more driver-specific details
which are not required to implement an "application" facing API.
Indeed this is the disconnect.
I already answered but it seems you don't accept the answer.
Same with you. That is why I requested, we need to get opinions from others.
Some of them already provided opinions in RFC.
This is why I Cc'ed techboard.
Yes. Indeed.
quoted
quoted
quoted
First, this is not driver-specific. It is a low-level API.
What is the difference between low-level API and driver-level API.
The low-level API provides tools to build a feature,
but no specific feature.
quoted
quoted
quoted
For example, If we need to implement application facing" subsystems like bbdev,
If we make all this driver interface, you can still implement the
bbdev API as a driver without
exposing HW specific details like how devices communicate to CPU, how
memory is allocated etc
 to "application".
There are 2 things to understand here.

First we want to allow the application using the GPU for needs which are
not exposed by any other DPDK API.

Second, if we want to implement another DPDK API like bbdev,
then the GPU implementation would be exposed as a vdev in bbdev,
using the HW GPU device being a PCI in gpudev.
They are two different levels, got it?
Exactly. So what is the point of exposing low-level driver API to
"application",
why not it is part of the internal driver API. My point is, why the
application needs to worry
about, How the CPU <-> Device communicated? CPU < -> Device memory
visibility etc.
There are two reasons.

1/ The application may want to use the GPU for some application-specific
needs which are not abstracted in DPDK API.
Yes. Exactly, That's where my concern, If we take this path, What is
the motivation to contribute to DPDK abstracted subsystem APIs which
make sense for multiple vendors and every
Similar stuff applicable for DPU,
A feature-specific API is better of course, there is no lose of motivation.
But you cannot forbid applications to have their own features on GPU.
it still can use it. We don't need DPDK APIs for that.
quoted
Otherway to put, if GPU is doing some ethdev offload, why not making
as ethdev offload in ethdev spec so that
another type of device can be used and make sense for application writters.
If we do ethdev offload, yes we'll implement it.
And we'll do it on top of gpudev, which is the only way to share the CPU.
quoted
For example, In the future, If someone needs to add ML(Machine
learning) subsystem and enable a proper subsystem
interface that is good for DPDK. If this path is open, there is no
motivation for contribution and the application
will not have a standard interface doing the ML job across multiple vendors.
Wrong. It does remove the motivation, it is a first step to build on top of it.
IMO, No need to make driver API to the public to feature API.
quoted
That's is the only reason why saying it should not APPLICATION
interface it can be DRIVER interface.
quoted
2/ This API may also be used by some feature implementation internally
in some DPDK libs or drivers.
We cannot skip the gpudev layer because this is what allows generic probing
of the HW, and gpudev allows to share the GPU with multiple features
implemented in different libs or drivers, thanks to the "child" concept.
Again, why do applications need to know it? It is similar to `bus`
kind of this where it sharing the physical resouces.
No it's not a bus, it is a device that we need to share.
quoted
quoted
quoted
quoted
quoted
quoted
quoted
aka DPDK device class has a fixed personality and it has API to deal
with abstracting specific application specific
end user functionality like ethdev, cryptodev, eventdev irrespective
of underlying bus/device properties.
The goal of the lib is to allow anyone to invent any feature
which is not already available in DPDK.
quoted
Even similar semantics are required for DPU(Smart NIC)
communitication. I am planning to
send RFC in coming days to address the issue without the application
knowing the Bus/HW/Driver details.
gpudev is not exposing bus/hw/driver details.
I don't understand what you mean.
 > > > > > > See above.
We are going into circles.
Yes.
In short, Jerin wants to forbid the generic use of GPU in DPDK.
See below.
He wants only feature-specific API.
To re-reiterate, feature-specific "application" API. A device-specific
bit can be
driver API and accessible to the out-of-tree driver if needed.

IMO, if we take this path, DPU, XPU, GPU, etc we need N different libraries to
get the job done for a specific feature for the dataplane.
Instead, Enabling public feature APIs will make the application
portable and does not
need to worry about which type of *PU it runs.

It is like restricting the functions we can run on a CPU.

And anyway we need this layer to share the GPU between multiple features.
No disagreement there. Is that layer public application API or not is
the question.
it is like PCI device API calls over of the application and makes the
application device specific.
Techboard please vote.
Yes.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help