Thread (79 messages) 79 messages, 10 authors, 2021-07-06

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

From: Jerin Jacob <hidden>
Date: 2021-06-23 07:21:36

On Wed, Jun 23, 2021 at 9:00 AM fengchengwen [off-list ref] wrote:
quoted
quoted
quoted
quoted
quoted
The above will give better performance and is the best trade-off c
between performance and per transfer variables.
We may need to have different APIs for context-aware and context-unaware
processing, with which to use determined by the capabilities discovery.
Given that for these DMA devices the offload cost is critical, more so than
any other dev class I've looked at before, I'd like to avoid having APIs
with extra parameters than need to be passed about since that just adds
extra CPU cycles to the offload.
If driver does not support additional attributes and/or the
application does not need it, rte_dmadev_desc_t can be NULL.
So that it won't have any cost in the datapath. I think, we can go to
different API
cases if we can not abstract problems without performance impact.
Otherwise, it will be too much
pain for applications.
Yes, currently we plan to use different API for different case, e.g.
  rte_dmadev_memcpy()  -- deal with local to local memcopy
  rte_dmadev_memset()  -- deal with fill with local memory with pattern
maybe:
  rte_dmadev_imm_data()  --deal with copy very little data
  rte_dmadev_p2pcopy()   --deal with peer-to-peer copy of diffenet PCIE addr

These API capabilities will be reflected in the device capability set so that
application could know by standard API.

There will be a lot of combination of that it will be like M x N cross
base case, It won't scale.
Currently, it is hard to define generic dma descriptor, I think the well-defined
APIs is feasible.
I would like to understand why not feasible? if we move the
preparation to the slow path.

i.e

struct rte_dmadev_desc defines all the "attributes" of all DMA devices available
using capability. I believe with the scheme, we can scale and
incorporate all features of
all DMA HW without any performance impact.

something like:

struct rte_dmadev_desc {
  /* Attributes all DMA transfer available for all HW under capability. */
  channel or port;
  ops ; // copy, fill etc..
 /* impemention opqueue memory as zero length array,
rte_dmadev_desc_prep() update this memory with HW specific information
*/
  uint8_t impl_opq[];
}

// allocate the memory for dma decriptor
struct rte_dmadev_desc *rte_dmadev_desc_alloc(devid);
// Convert DPDK specific descriptors to HW specific descriptors in slowpath */
rte_dmadev_desc_prep(devid, struct rte_dmadev_desc *desc);
// Free dma descriptor memory
rte_dmadev_desc_free(devid, struct rte_dmadev_desc *desc )

The above calls in slow path.

Only below call in fastpath.
// Here desc can be NULL(in case you don't need any specific attribute
attached to transfer, if needed, it can be an object which is gone
through rte_dmadev_desc_prep())
rte_dmadev_enq(devid, struct rte_dmadev_desc *desc, void *src, void
*dest, unsigned int len, cookie)
quoted
quoted
quoted
Just to understand, I think, we need to HW capabilities and how to
have a common API.
I assume HW will have some HW JOB descriptors which will be filled in
SW and submitted to HW.
In our HW,  Job descriptor has the following main elements

- Channel   // We don't expect the application to change per transfer
- Source address - It can be scatter-gather too - Will be changed per transfer
- Destination address - It can be scatter-gather too - Will be changed
per transfer
- Transfer Length - - It can be scatter-gather too - Will be changed
per transfer
- IOVA address where HW post Job completion status PER Job descriptor
- Will be changed per transfer
- Another sideband information related to channel  // We don't expect
the application to change per transfer
- As an option, Job completion can be posted as an event to
rte_event_queue  too // We don't expect the application to change per
transfer
The 'option' field looks like a software interface field, but not HW descriptor.
It is in HW descriptor.
The HW is interesting, something like: DMA could send completion direct to EventHWQueue,
the DMA and EventHWQueue are link in the hardware range, rather than by software.
Yes.
Could you provide public driver of this HW ? So we could know more about it's working
mechanism and software-hardware collaboration.
http://code.dpdk.org/dpdk/v21.05/source/drivers/raw/octeontx2_dma/otx2_dpi_rawdev.h#L149
is the DMA instruction header.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help