Re: [dpdk-dev] [RFC PATCH] dmadev: introduce DMA device library
From: fengchengwen <hidden>
Date: 2021-06-16 10:17:14
On 2021/6/16 15:09, Morten Brørup wrote:
quoted
From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Bruce Richardson Sent: Tuesday, 15 June 2021 18.39 On Tue, Jun 15, 2021 at 09:22:07PM +0800, Chengwen Feng wrote:quoted
This patch introduces 'dmadevice' which is a generic type of DMA device. The APIs of dmadev library exposes some generic operations which can enable configuration and I/O with the DMA devices. Signed-off-by: Chengwen Feng <redacted> ---Thanks for sending this. Of most interest to me right now are the key data-plane APIs. While we are still in the prototyping phase, below is a draft of what we are thinking for the key enqueue/perform_ops/completed_ops APIs. Some key differences I note in below vs your original RFC: * Use of void pointers rather than iova addresses. While using iova's makes sense in the general case when using hardware, in that it can work with both physical addresses and virtual addresses, if we change the APIs to use void pointers instead it will still work for DPDK in VA mode, while at the same time allow use of software fallbacks in error cases, and also a stub driver than uses memcpy in the background. Finally, using iova's makes the APIs a lot more awkward to use with anything but mbufs or similar buffers where we already have a pre-computed physical address. * Use of id values rather than user-provided handles. Allowing the user/app to manage the amount of data stored per operation is a better solution, I feel than proscribing a certain about of in-driver tracking. Some apps may not care about anything other than a job being completed, while other apps may have significant metadata to be tracked. Taking the user-context handles out of the API also makes the driver code simpler. * I've kept a single combined API for completions, which differs from the separate error handling completion API you propose. I need to give the two function approach a bit of thought, but likely both could work. If we (likely) never expect failed ops, then the specifics of error handling should not matter that much. For the rest, the control / setup APIs are likely to be rather uncontroversial, I suspect. However, I think that rather than xstats APIs, the library should first provide a set of standardized stats like ethdev does. If driver-specific stats are needed, we can add xstats later to the API. Appreciate your further thoughts on this, thanks. Regards, /BruceI generally agree with Bruce's points above. I would like to share a couple of ideas for further discussion: 1. API for bulk operations. The ability to prepare a vector of DMA operations, and then post it to the DMA driver.
We consider bulk operation and final decide not to support: 1. The DMA engine don't applicable to small-packet scenarios which have high PPS. PS: The vector is suitable for high PPS. 2. To support post bulk ops, we need define standard struct like rte_mbuf, and application may nned init the struct field and pass them as pointer array, this may cost too much CPU. 3. The post request was simple than process completed operations, The CPU write performance is also good. ---driver could use vectors to accelerate the process of completed operations.
2. Prepare the API for more complex DMA operations than just copy/fill. E.g. blitter operations like "copy A bytes from the source starting at address X, to the destination starting at address Y, masked with the bytes starting at address Z, then skip B bytes at the source and C bytes at the destination, rewind the mask to the beginning of Z, and repeat D times". This is just an example. I'm suggesting to use a "DMA operation" union structure as parameter to the command enqueue function, rather than having individual functions for each possible DMA operation.
There are many sisution which may hard to define such structure, I prefer separates API like copy/fill/... PS: I saw struct dma_device (Linux dmaengine.h) also support various prep_xxx API.
I know I'm not the only one old enough on the mailing list to have worked with the Commodore Amiga's blitter. :-) DPDK has lots of code using CPU vector instructions to shuffle bytes around. I can easily imagine a DMA engine doing similar jobs, possibly implemented in an FPGA or some other coprocessor. -Morten .