Thread (339 messages) 339 messages, 17 authors, 2021-10-17

Re: [dpdk-dev] [PATCH v2] dmadev: introduce DMA device library

From: Jerin Jacob <hidden>
Date: 2021-07-13 08:59:38

On Mon, Jul 12, 2021 at 10:30 PM Bruce Richardson
[off-list ref] wrote:
On Mon, Jul 12, 2021 at 10:04:07PM +0530, Jerin Jacob wrote:
quoted
On Mon, Jul 12, 2021 at 7:02 PM Bruce Richardson
[off-list ref] wrote:
quoted
On Mon, Jul 12, 2021 at 03:29:27PM +0530, Jerin Jacob wrote:
quoted
On Sun, Jul 11, 2021 at 2:59 PM Chengwen Feng [off-list ref] wrote:
quoted
This patch introduce 'dmadevice' which is a generic type of DMA
device.

The APIs of dmadev library exposes some generic operations which can
enable configuration and I/O with the DMA devices.

Signed-off-by: Chengwen Feng <redacted>
---
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Enqueue a fill operation onto the virtual DMA channel.
+ *
+ * This queues up a fill operation to be performed by hardware, but does not
+ * trigger hardware to begin that operation.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param pattern
+ *   The pattern to populate the destination buffer with.
+ * @param dst
+ *   The address of the destination buffer.
+ * @param length
+ *   The length of the destination buffer.
+ * @param flags
+ *   An flags for this operation.
+ *
+ * @return
+ *   - 0..UINT16_MAX: index of enqueued copy job.
fill job
quoted
+ *   - <0: Error code returned by the driver copy function.
+ */
+__rte_experimental
+static inline int
+rte_dmadev_fill(uint16_t dev_id, uint16_t vchan, uint64_t pattern,
+               rte_iova_t dst, uint32_t length, uint64_t flags)
+{
+       struct rte_dmadev *dev = &rte_dmadevices[dev_id];
+#ifdef RTE_DMADEV_DEBUG
+       RTE_DMADEV_VALID_DEV_ID_OR_ERR_RET(dev_id, -EINVAL);
+       RTE_FUNC_PTR_OR_ERR_RET(*dev->fill, -ENOTSUP);
+       if (vchan >= dev->data->dev_conf.max_vchans) {
+               RTE_DMADEV_LOG(ERR, "Invalid vchan %d\n", vchan);
+               return -EINVAL;
+       }
+#endif
+       return (*dev->fill)(dev, vchan, pattern, dst, length, flags);
+}
+
quoted
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Returns the number of operations that have been successfully completed.
+ *
+ * @param dev_id
+ *   The identifier of the device.
+ * @param vchan
+ *   The identifier of virtual DMA channel.
+ * @param nb_cpls
+ *   The maximum number of completed operations that can be processed.
+ * @param[out] last_idx
+ *   The last completed operation's index.
+ *   If not required, NULL can be passed in.
This means the driver will be tracking the last index.
Driver will be doing this anyway, no, since it needs to ensure we don't
Yes.
quoted
wrap around?
quoted
quoted
Is that mean, the application needs to call this API periodically to
consume the completion slot.
I.e up to 64K (UINT16_MAX)  outstanding jobs are possible. If the
application fails to call this
quoted
64K outstand job then the subsequence enqueue will fail.
Well, given that there will be a regular enqueue ring which will probably
be <= 64k in size, the completion call will need to be called frequently
anyway. I don't think we need to document this restriction as it's fairly
understood that you can't go beyond the size of the ring without cleanup.

See below.
quoted
quoted
If so, we need to document this.

One of the concerns of keeping UINT16_MAX as the limit is the
completion memory will always not in cache.
On the other hand, if we make this size programmable. it may introduce
complexity in the application.

Thoughts?
The reason for using powers-of-2 sizes, e.g. 0 .. UINT16_MAX, is that the
ring can be any other power-of-2 size and we can index it just by masking.
In the sample app for dmadev, I expect the ring size used to be set the
same as the dmadev enqueue ring size, for simplicity.
No question on not using power of 2. Aligned on that.

At least in our HW, the size of the ring is rte_dmadev_vchan_conf::nb_desc.
But completion happens in _different_ memory space. Currently, we are allocating
UINT16_MAX entries to hold that. That's where cache miss aspects of
completion aspects
came.
Depending on HW, our completions can be written back to a separate memory
area - a completion ring, if you will - but I've generally found it works
as well to reuse the enqueue ring for that purpose. However, with a
separate memory area for completions, why do you need to allocate 64K
entries for the completions? Would nb_desc entries not be enough? Is that
to allow the user to have more than nb_desc jobs outstanding before calling
"get_completions" API?
Yes. That's what I thought. Thats where my question on what is the max number of
outstanding completions. I thought it can be up to 64K. Agree to keep
it implementation-specific and not need to highlight this in the
documentation.

quoted
In your case, Is completion happens in the same ring memory(looks like
one bit in job desc represents the job completed or not) ?
And when application calls rte_dmadev_completed(), You  are converting
UINT16_MAX based index to
rte_dmadev_vchan_conf::nb_desc. Right?
Yes, we are masking to do that. Actually, for simplicity and perf we should
only allow power-of-2 ring sizes. Having to use modulus instead of masking
could be a problem. [Alternatively, I suppose we can allow drivers to round
up the ring sizes to the next power of 2, but I prefer just documenting it
as a limitation].
OK.
/Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help