Thread (79 messages) 79 messages, 10 authors, 2021-07-06

Re: [dpdk-dev] dmadev discussion summary

From: fengchengwen <hidden>
Date: 2021-07-02 13:45:23

On 2021/7/1 23:01, Jerin Jacob wrote:
quoted
  [key point]:
      -----------    -----------
      | channel |    | channel |
      -----------    -----------
             \           /
              \         /
               \       /
             ------------
             | HW-queue |
             ------------
                   |
                --------
                |rawdev|
                --------
      1) User could create one channel by init context(dpi_dma_queue_ctx_s),
         this interface is not standardized and needs to be implemented by
         users.
      2) Different channels can support different transmissions, e.g. one for
         inner m2m, and other for inbound copy.

      Overall, I think the 'channel' is similar the 'virt-queue' of dpaa2_qdma.
      The difference is that dpaa2_qdma supports multiple hardware queues. The
      'channel' has following
If dpaa2_qdma supports more than one HW queue, I think, it is good to
have the queue notion
in DPDK just like other DPDK device classes. It will be good to have
confirmation from dpaa2 folks, @Hemant Agrawal,
if there are really more than 1 HW queue in dppa device.


IMO, Channel is a better name than a virtual queue. The reason is,
virtual queue is more
implementation-specific notation. No need to have this in API specification.
In the DPDK framework, many data-plane API names contain queues. e.g. eventdev/crypto..
The concept of virt queues has continuity.
quoted
      [dma_copy/memset/sg]: all has vq_id input parameter.
         Note: I notice dpaa can't support single and sg in one virt-queue, and
               I think it's maybe software implement policy other than HW
               restriction because virt-queue could share the same HW-queue.
      Here we use vq_id to tackle different scenario, like local-to-local/
      local-to-host and etc.
IMO, The index representation has an additional overhead as one needs
to translate it
to memory pointer. I prefer to avoid by having object handle and use
_lookup() API get to make it work
in multi-process cases to avoid the additional indirection. Like mempool object.
This solution was first considered, similar to rte_hash returning a handle.
It is not intuitive and has no obvious performance advantage. The number of
jump times of the data-plane API index-driven callback function is not optimized.
quoted
  5) And the dmadev public data-plane API (just prototype):
     dma_cookie_t rte_dmadev_memset(dev, vq_id, pattern, dst, len, flags)
       -- flags: used as an extended parameter, it could be uint32_t
     dma_cookie_t rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags)
     dma_cookie_t rte_dmadev_memcpy_sg(dev, vq_id, sg, sg_len, flags)
       -- sg: struct dma_scatterlist array
     uint16_t rte_dmadev_completed(dev, vq_id, dma_cookie_t *cookie,
                                   uint16_t nb_cpls, bool *has_error)
       -- nb_cpls: indicate max process operations number
       -- has_error: indicate if there is an error
       -- return value: the number of successful completed operations.
       -- example:
          1) If there are already 32 completed ops, and 4th is error, and
             nb_cpls is 32, then the ret will be 3(because 1/2/3th is OK), and
             has_error will be true.
          2) If there are already 32 completed ops, and all successful
             completed, then the ret will be min(32, nb_cpls), and has_error
             will be false.
          3) If there are already 32 completed ops, and all failed completed,
             then the ret will be 0, and has_error will be true.
+1. IMO, it is better to call ring_idx instead of a cookie. To enforce
that it the ring index.
quoted
     uint16_t rte_dmadev_completed_status(dev_id, vq_id, dma_cookie_t *cookie,
                                          uint16_t nb_status, uint32_t *status)
       -- return value: the number of failed completed operations.
See above. Here we are assuming it is an index otherwise we need to
pass an array
cookies.
quoted
     And here I agree with Morten: we should design API which adapts to DPDK
     service scenarios. So we don't support some sound-cards DMA, and 2D memory
     copy which mainly used in video scenarios.
  6) The dma_cookie_t is signed int type, when <0 it mean error, it's
     monotonically increasing base on HW-queue (other than virt-queue). The
     driver needs to make sure this because the damdev framework don't manage
     the dma_cookie's creation.
+1 and see above.
quoted
  7) Because data-plane APIs are not thread-safe, and user could determine
     virt-queue to HW-queue's map (at the queue-setup stage), so it is user's
     duty to ensure thread-safe.
+1. But I am not sure how easy for the fast-path application to have this logic,
Instead, I think, it is better to tell the capa for queue by driver
and in channel configuration,
the application can request for requirement (Is multiple producers enq
to the same HW queue or not).
Based on the request, the implementation can pick the correct function
pointer for enq.(lock vs lockless version if HW does not support
lockless)
already redesigned. Please check the latest patch.
quoted
  8) One example:
     vq_id = rte_dmadev_queue_setup(dev, config.{HW-queue-index=x, opaque});
     if (vq_id < 0) {
        // create virt-queue failed
        return;
     }
     // submit memcpy task
     cookit = rte_dmadev_memcpy(dev, vq_id, src, dst, len, flags);
     if (cookie < 0) {
        // submit failed
        return;
     }
     // get complete task
     ret = rte_dmadev_completed(dev, vq_id, &cookie, 1, has_error);
     if (!has_error && ret == 1) {
        // the memcpy successful complete
     }
+1
quoted
  9) As octeontx2_dma support sg-list which has many valid buffers in
     dpi_dma_buf_ptr_s, it could call the rte_dmadev_memcpy_sg API.
+1
quoted
  10) As ioat, it could delcare support one HW-queue at dev_configure stage, and
      only support create one virt-queue.
  11) As dpaa2_qdma, I think it could migrate to new framework, but still wait
      for dpaa2_qdma guys feedback.
  12) About the prototype src/dst parameters of rte_dmadev_memcpy API, we have
      two candidates which are iova and void *, how about introduce dma_addr_t
      type which could be va or iova ?
I think, conversion looks ugly, better to have void * and share the
constraints of void *
as limitation/capability using flag. So that driver can update it.
already change to void *
quoted
.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help