Thread (109 messages) 109 messages, 9 authors, 2017-02-07

Re: [PATCH 1/4] eventdev: introduce event driven programming model

From: Bruce Richardson <hidden>
Date: 2016-11-25 11:00:57

On Fri, Nov 25, 2016 at 05:53:34AM +0530, Jerin Jacob wrote:
On Thu, Nov 24, 2016 at 04:35:56PM +0100, Thomas Monjalon wrote:
quoted
2016-11-24 07:29, Jerin Jacob:
quoted
On Wed, Nov 23, 2016 at 07:39:09PM +0100, Thomas Monjalon wrote:
quoted
2016-11-18 11:14, Jerin Jacob:
quoted
+Eventdev API - EXPERIMENTAL
+M: Jerin Jacob [off-list ref]
+F: lib/librte_eventdev/
OK to mark it experimental.
What is the plan to remove the experimental word?
IMO, EXPERIMENTAL status can be changed when
- At least two event drivers available(Intel and Cavium are working on
  SW and HW event drivers)
- Functional test applications are fine with at least two drivers
- Portable example application to showcase the features of the library
- eventdev integration with another dpdk subsystem such as ethdev

Thoughts?. I am not sure the criteria used in cryptodev case.
Sounds good.
We will be more confident when drivers and tests will be implemented.

I think the roadmap for the SW driver targets the release 17.05.
Do you still plan 17.02 for this API and the Cavium driver?
No. 17.02 too short for up-streaming the Cavium driver.However, I think API and
skeleton event driver can go in 17.02 if there are no objections.
quoted
quoted
quoted
quoted
+#define EVENTDEV_NAME_SKELETON_PMD event_skeleton
+/**< Skeleton event device PMD name */
I do not understand this #define.
Applications can explicitly request the a specific driver though driver
name. This will go as argument to rte_event_dev_get_dev_id(const char *name).
The reason for keeping this #define in rte_eventdev.h is that,
application needs to include only rte_eventdev.h not rte_eventdev_pmd.h.
So each driver must register its name in the API?
Is it really needed?
Otherwise how application knows the name of the driver.
The similar scheme used in cryptodev.
http://dpdk.org/browse/dpdk/tree/lib/librte_cryptodev/rte_cryptodev.h#n53
No strong opinion here. Open for suggestions.
I like having a name registered. I think we need a scheme where an app
can find and use an implementation using a specific driver.
quoted
quoted
quoted
quoted
+struct rte_event_dev_config {
+	uint32_t dequeue_wait_ns;
+	/**< rte_event_dequeue() wait for *dequeue_wait_ns* ns on this device.
Please explain exactly when the wait occurs and why.
Here is the explanation from rte_event_dequeue() API definition,
-
@param wait
0 - no-wait, returns immediately if there is no event.
quoted
0 - wait for the event, if the device is configured with
RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT then this function will wait until
the event available or *wait* time.
if the device is not configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT
then this function will wait until the event available or *dequeue_wait_ns*
                                                      ^^^^^^^^^^^^^^^^^^^^^^
ns which was previously supplied to rte_event_dev_configure()
-
This is provides the application to have control over, how long the
implementation should wait if event is not available.

Let me know what exact changes are required if details are not enough in
rte_event_dequeue() API definition.
Maybe that timeout would be a better name.
It waits only if there is nothing in the queue.
It can be interesting to highlight in this comment that this parameter
makes the dequeue function a blocking call.
OK. I will change to timeout then
quoted
quoted
quoted
quoted
+/** Event port configuration structure */
+struct rte_event_port_conf {
+	int32_t new_event_threshold;
+	/**< A backpressure threshold for new event enqueues on this port.
+	 * Use for *closed system* event dev where event capacity is limited,
+	 * and cannot exceed the capacity of the event dev.
+	 * Configuring ports with different thresholds can make higher priority
+	 * traffic less likely to  be backpressured.
+	 * For example, a port used to inject NIC Rx packets into the event dev
+	 * can have a lower threshold so as not to overwhelm the device,
+	 * while ports used for worker pools can have a higher threshold.
+	 * This value cannot exceed the *nb_events_limit*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+	uint8_t dequeue_depth;
+	/**< Configure number of bulk dequeues for this event port.
+	 * This value cannot exceed the *nb_event_port_dequeue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+	uint8_t enqueue_depth;
+	/**< Configure number of bulk enqueues for this event port.
+	 * This value cannot exceed the *nb_event_port_enqueue_depth*
+	 * which previously supplied to rte_event_dev_configure()
+	 */
+};
The depth configuration is not clear to me.
Basically the maximum number of events can be enqueued/dequeued at time
from a given event port. depth of one == non burst mode.
OK so depth is the queue size. Please could you reword?
OK
quoted
quoted
quoted
quoted
+/* Event types to classify the event source */
Why this classification is needed?
This for application pipeling and the cases like, if application wants to know which
subsystem generated the event.

example packet forwarding loop on the worker cores:
while(1) {
	ev = dequeue()
	// event from ethdev subsystem
	if (ev.event_type == RTE_EVENT_TYPE_ETHDEV) {
		- swap the mac address
		- push to atomic queue for ingress flow order maintenance
		  by CORE
	/* events from core */
	} else if (ev.event_type == RTE_EVENT_TYPE_CORE) {

	}
	enqueue(ev);
}
I don't know why but I feel this classification is weak.
You need to track the source of the event. Does it make sense to go beyond
and identify the source device?
No, dequeue has dev_id argument, so event comes only from that device
quoted
quoted
quoted
quoted
+#define RTE_EVENT_TYPE_ETHDEV           0x0
+/**< The event generated from ethdev subsystem */
+#define RTE_EVENT_TYPE_CRYPTODEV        0x1
+/**< The event generated from crypodev subsystem */
+#define RTE_EVENT_TYPE_TIMERDEV         0x2
+/**< The event generated from timerdev subsystem */
+#define RTE_EVENT_TYPE_CORE             0x3
+/**< The event generated from core.
What is core?
The event are generated by lcore for pipeling. Any suggestion for
better name? lcore?
What about CPU or SW?
No strong opinion here. I will go with CPU then
If you have no strong opinion, I think I'd prefer SW to CPU, as the main
difference to my mind is that this comes from another SW entity rather
than a hardware block.
quoted
quoted
quoted
quoted
+		/**< Opaque event pointer */
+		struct rte_mbuf *mbuf;
+		/**< mbuf pointer if dequeued event is associated with mbuf */
How do we know that an event is associated with mbuf?
By looking at the event source/type RTE_EVENT_TYPE_*
quoted
Does it mean that such events are always converted into mbuf even if the
application does not need it?
Hardware has dependency on getting physical address of the event, so any
struct that has "phys_addr_t buf_physaddr" works.
I do not understand.
In HW based implementations, the event pointer will be submitted to HW.
As you know, since HW can't understand the virtual address and it needs
to converted to the physical address, any DPDK object that provides phys_addr_t
such as mbuf can be used with libeventdev.
quoted
I tought that decoding the event would be the responsibility of the app
by calling a function like
rte_eventdev_convert_to_mbuf(struct rte_event *, struct rte_mbuf *).
It can be. But it is costly.i.e Yet another function pointer based
driver interface on fastpath. Instead, if the driver itself can
convert to mbuf(in case of ETHDEV device) and tag the source/event type
as RTE_EVENT_TYPE_ETHDEV.
IMO the proposed schemed helps in SW based implementation as their no real
mbuf conversation. Something we can revisit in ethdev integration if
required.
quoted
quoted
quoted
quoted
+struct rte_eventdev_driver;
+struct rte_eventdev_ops;
I think it is better to split API and driver interface in two files.
(we should do this split in ethdev)
I thought so, but then the "static inline" versions of northbound
API(like rte_event_enqueue) will go another file(due to the fact that
implementation need to deference "dev->data->ports[port_id]"). Do you want that way?
I would like to keep all northbound API in rte_eventdev.h and not any of them
in rte_eventdev_pmd.h.
My comment was confusing.
You are doing 2 files, one for API (what you call northbound I think)
and the other one for driver interface (what you call southbound I think),
it's very fine.
quoted
quoted
quoted
+/**
+ * Enqueue the event object supplied in the *rte_event* structure on an
+ * event device designated by its *dev_id* through the event port specified by
+ * *port_id*. The event object specifies the event queue on which this
+ * event will be enqueued.
+ *
+ * @param dev_id
+ *   Event device identifier.
+ * @param port_id
+ *   The identifier of the event port.
+ * @param ev
+ *   Pointer to struct rte_event
+ *
+ * @return
+ *  - 0 on success
+ *  - <0 on failure. Failure can occur if the event port's output queue is
+ *     backpressured, for instance.
+ */
+static inline int
+rte_event_enqueue(uint8_t dev_id, uint8_t port_id, struct rte_event *ev)
Is it really needed to have non-burst variant of enqueue/dequeue?
Yes. certain HW can work only with non burst variants.
Same comment as Bruce, we must keep only the burst variant.
We cannot have different API for different HW.
I don't think there is any portability issue here, I can explain.

The application level, we have two more use case to deal with non burst
variant

- latency critical work
- on dequeue, if application wants to deal with only one flow(i.e to
  avoid processing two different application flows to avoid cache trashing)

Selection of the burst variants will be based on
rte_event_dev_info_get() and rte_event_dev_configure()(see, max_event_port_dequeue_depth,
max_event_port_enqueue_depth, nb_event_port_dequeue_depth, nb_event_port_enqueue_depth )
So I don't think their is portability issue here and I don't want to waste my
CPU cycles on the for loop if application known to be working with non
bursts variant like below
If the application is known to be working on non-burst varients, then
they always request a burst-size of 1, and skip the loop completely.
There is no extra performance hit in that case in either the app or the
driver (since the non-burst driver always returns 1, irrespective of the
number requested).
nb_events = rte_event_dequeue_burst();
for(i=0; i < nb_events; i++){
	process ev[i]
}

And mostly importantly the NPU can get almost same throughput
without burst variant so why not?
quoted
quoted
quoted
quoted
+/**
+ * Converts nanoseconds to *wait* value for rte_event_dequeue()
+ *
+ * If the device is configured with RTE_EVENT_DEV_CFG_PER_DEQUEUE_WAIT flag then
+ * application can use this function to convert wait value in nanoseconds to
+ * implementations specific wait value supplied in rte_event_dequeue()
Why is it implementation-specific?
Why this conversion is not internal in the driver?
This is for performance optimization, otherwise in drivers
need to convert ns to ticks in "fast path"
So why not defining the unit of this timeout as CPU cycles like the ones
returned by rte_get_timer_cycles()?
Because HW co-processor can run in different clock domain. Need not be at
CPU frequency.
While I've no huge objection to this API, since it will not be
implemented by our SW implementation, I'm just curious as to how much
having this will save. How complicated is the arithmetic that needs to
be done, and how many cycles on your platform is that going to take?

/Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help