Re: [dpdk-dev] [RFC PATCH 0/1] Dataplane Workload Accelerator library

From: Jerin Jacob <hidden>
Date: 2021-10-29 15:52:15

On Fri, Oct 29, 2021 at 5:27 PM Mattias Rönnblom
[off-list ref] wrote:

On 2021-10-25 11:03, Jerin Jacob wrote:

quoted

On Mon, Oct 25, 2021 at 1:05 PM Mattias Rönnblom
[off-list ref] wrote:

quoted

On 2021-10-19 20:14, jerinj@marvell.com wrote:

quoted

From: Jerin Jacob <redacted>


Dataplane Workload Accelerator library
======================================

Definition of Dataplane Workload Accelerator
--------------------------------------------
Dataplane Workload Accelerator(DWA) typically contains a set of CPUs,
Network controllers and programmable data acceleration engines for
packet processing, cryptography, regex engines, baseband processing, etc.
This allows DWA to offload  compute/packet processing/baseband/
cryptography-related workload from the host CPU to save the cost and power.
Also to enable scaling the workload by adding DWAs to the Host CPU as needed.

Unlike other devices in DPDK, the DWA device is not fixed-function
due to the fact that it has CPUs and programmable HW accelerators.

There are already several instances of DPDK devices with pure-software
implementation. In this regard, a DPU/SmartNIC represents nothing new.
What's new, it seems to me, is a much-increased need to
configure/arrange the processing in complex manners, to avoid bouncing
everything to the host CPU.

Yes and No. It will be based on the profile. The TLV type TYPE_USER_PLANE will
have user plane traffic from/to host. For example, offloading ORAN split 7.2
baseband profile. Transport blocks sent to/from host as TYPE_USER_PLANE.

quoted

Something like P4 or rte_flow-based hooks or
some other kind of extension. The eventdev adapters solve the same
problem (where on some systems packets go through the host CPU on their
way to the event device, and others do not) - although on a *much*
smaller scale.

Yes. Eventdev Adapters only for event device plumbing.

quoted

"Not-fixed function" seems to call for more hot plug support in the
device APIs. Such functionality could then be reused by anything that
can be reconfigured dynamically (FPGAs, firmware-programmed
accelerators, etc.),

Yes.

quoted

but which may not be able to serve as a RPC
endpoint, like a SmartNIC.

It can. That's the reason for choosing TLVs. So that
any higher level language can use TLVs like https://protect2.fireeye.com/v1/url?k=96886daf-c91357b6-96882d34-8682aaa22bc0-c994a5dcbda5d9e8&q=1&e=e89c0aca-a3b3-4f72-b616-ba4550b856b6&u=https%3A%2F%2Fgithub.com%2Fustropo%2Futtlv
to communicate with the accelerator.  TLVs follow the request and
response scheme like RPC. So it can warp it under application if needed.

quoted

DWA could be some kind of DPDK-internal framework for managing certain
type of DPUs, but should it be exposed to the user application?

Could you clarify a bit more.
The offload is represented as a set of TLVs in generic fashion. There
is no DPU specific bit in offload representation. See
rte_dwa_profiile_l3fwd.h header file.


It seems a bit cumbersome to work with TLVs on the user application
side. Would it be an alternative to have the profile API as a set of C
APIs instead of TLV-based messaging interface? The underlying
implementation could still be - in many or all cases - be TLVs sent over
some appropriate transport.

The reason to pick TLVs is as follows

1) Very easy to enable ABI compatibility. (Learned from rte_flow)
2) If it needs to be transported over network etc it needs to be
packed so that way
it is easy for implementation to do that with TLV also it gives better
performance in such
cases by avoiding reformatting or possibly avoiding memcpy etc.
3) It is easy to plugin with another high-level programing language as
just one API
4) Easy to decouple DWA core library functionalities from profile.
5) Easy to enable asynchronous scheme using request and response TLVs.
6) Most importantly, We could introduce type notion with TLV
(connected with the type of message  See TYPE_ATTACHED, TYPE_STOPPED,
TYPE_USER_PLANE etc ),
That way, we can have a uniform outlook of profiles instead of each profile
coming with a setup of its own APIs and __rules__ on the state machine.
I think, for a framework to leverage communication mechanisms and other
aspects between profiles, it's important to have some synergy between profiles.


Yes. I agree that a bit more logic is required on the application side
to use TLV,
But I think we can have a wrapper function getting req and response structures.

Such a C API could still be asynchronous, and still be a profile API
(rather than a set of new DPDK device types).


What I tried to ask during the meeting but where I didn't get an answer
(or at least one that I could understand) was how the profiles was to be
specified and/or documented. Maybe the above is what you had in mind
already.

Yes. Documentation is easy, please check the RFC header file for Doxygen
meta to express all the attributes of a TLV.


+enum rte_dwa_port_host_ethernet {
+ /**
+ * Attribute |  Value
+ * ----------|--------
+ * Tag       | RTE_DWA_TAG_PORT_HOST_ETHERNET
+ * Stag      | RTE_DWA_STAG_PORT_HOST_ETHERNET_H2D_INFO
+ * Direction | H2D
+ * Type      | TYPE_ATTACHED
+ * Payload   | NA
+ * Pair TLV  | RTE_DWA_STAG_PORT_HOST_ETHERNET_D2H_INFO
+ *
+ * Request DWA host ethernet port information.
+ */
+ RTE_DWA_STAG_PORT_HOST_ETHERNET_H2D_INFO,
+ /**
+ * Attribute |  Value
+ * ----------|---------
+ * Tag       | RTE_DWA_TAG_PORT_HOST_ETHERNET
+ * Stag      | RTE_DWA_STAG_PORT_HOST_ETHERNET_D2H_INFO
+ * Direction | H2D
+ * Type      | TYPE_ATTACHED
+ * Payload   | struct rte_dwa_port_host_ethernet_d2h_info
+ * Pair TLV  | RTE_DWA_STAG_PORT_HOST_ETHERNET_H2D_INFO
+ *
+ * Response for DWA host ethernet port information.
+ */
+ RTE_DWA_STAG_PORT_HOST_ETHERNET_D2H_INFO,

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help