Thread (79 messages) 79 messages, 10 authors, 2021-07-06

Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library

From: fengchengwen <hidden>
Date: 2021-06-16 10:17:14

On 2021/6/16 15:09, Morten Brørup wrote:
quoted
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson
Sent: Tuesday, 15 June 2021 18.39

On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:
quoted
This patch introduces 'dmadevice' which is a generic type of DMA
device.

The APIs of dmadev library exposes some generic operations which can
enable configuration and I/O with the DMA devices.

Signed-off-by: Chengwen Feng <redacted>
---
Thanks for sending this.

Of most interest to me right now are the key data-plane APIs. While we
are
still in the prototyping phase, below is a draft of what we are
thinking
for the key enqueue/perform_ops/completed_ops APIs.

Some key differences I note in below vs your original RFC:
* Use of void pointers rather than iova addresses. While using iova's
makes
  sense in the general case when using hardware, in that it can work
with
  both physical addresses and virtual addresses, if we change the APIs
to use
  void pointers instead it will still work for DPDK in VA mode, while
at the
  same time allow use of software fallbacks in error cases, and also a
stub
  driver than uses memcpy in the background. Finally, using iova's
makes the
  APIs a lot more awkward to use with anything but mbufs or similar
buffers
  where we already have a pre-computed physical address.
* Use of id values rather than user-provided handles. Allowing the
user/app
  to manage the amount of data stored per operation is a better
solution, I
  feel than proscribing a certain about of in-driver tracking. Some
apps may
  not care about anything other than a job being completed, while other
apps
  may have significant metadata to be tracked. Taking the user-context
  handles out of the API also makes the driver code simpler.
* I've kept a single combined API for completions, which differs from
the
  separate error handling completion API you propose. I need to give
the
  two function approach a bit of thought, but likely both could work.
If we
  (likely) never expect failed ops, then the specifics of error
handling
  should not matter that much.

For the rest, the control / setup APIs are likely to be rather
uncontroversial, I suspect. However, I think that rather than xstats
APIs,
the library should first provide a set of standardized stats like
ethdev
does. If driver-specific stats are needed, we can add xstats later to
the
API.

Appreciate your further thoughts on this, thanks.

Regards,
/Bruce
I generally agree with Bruce's points above.

I would like to share a couple of ideas for further discussion:

1. API for bulk operations.
The ability to prepare a vector of DMA operations, and then post it to the DMA driver.
We consider bulk operation and final decide not to support:
1. The DMA engine don't applicable to small-packet scenarios which have high PPS.
   PS: The vector is suitable for high PPS.
2. To support post bulk ops, we need define standard struct like rte_mbuf, and
   application may nned init the struct field and pass them as pointer array,
   this may cost too much CPU.
3. The post request was simple than process completed operations, The CPU write
   performance is also good. ---driver could use vectors to accelerate the process
   of completed operations.
2. Prepare the API for more complex DMA operations than just copy/fill.
E.g. blitter operations like "copy A bytes from the source starting at address X, to the destination starting at address Y, masked with the bytes starting at address Z, then skip B bytes at the source and C bytes at the destination, rewind the mask to the beginning of Z, and repeat D times". This is just an example.
I'm suggesting to use a "DMA operation" union structure as parameter to the command enqueue function, rather than having individual functions for each possible DMA operation.
There are many sisution which may hard to define such structure, I prefer separates API like copy/fill/...
PS: I saw struct dma_device (Linux dmaengine.h) also support various prep_xxx API.
I know I'm not the only one old enough on the mailing list to have worked with the Commodore Amiga's blitter. :-)
DPDK has lots of code using CPU vector instructions to shuffle bytes around. I can easily imagine a DMA engine doing similar jobs, possibly implemented in an FPGA or some other coprocessor.

-Morten


.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help