Thread (10 messages) 10 messages, 2 authors, 2021-01-14

Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

From: Pavan Nikhilesh Bhagavatula <hidden>
Date: 2021-01-05 09:29:41

Hi Feifei,
Hi, Pavan

Sorry for my late reply and thanks very much for your review.
quoted
-----Original Message-----
From: Pavan Nikhilesh Bhagavatula <redacted>
Sent: 2020年12月22日 18:33
To: Feifei Wang <redacted>; jerinj@marvell.com;
Harry van
quoted
Haaren [off-list ref]; Pavan Nikhilesh
[off-list ref]
Cc: dev@dpdk.org; nd <redacted>; Honnappa Nagarahalli
[off-list ref]; stable@dpdk.org; Phil Yang
[off-list ref]
Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
for
quoted
pipeline test

quoted
Add release barriers before updating the processed packets for
worker
quoted
quoted
lcores to ensure the worker lcore has really finished data processing
and then it can update the processed packets number.
I believe we can live with minor inaccuracies in stats being presented
as
quoted
atomics are pretty heavy when scheduler is limited to burst size as 1.

One option is to move it before a pipeline operation
(pipeline_event_tx,
quoted
pipeline_fwd_event etc.) as they imply implicit release barrier (as all
the
quoted
changes done to the event should be visible to the next core).
If I understand correctly, your meaning is that move release barriers
before
pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
been
processed before the next core begins to tx/fwd. For example:
What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier.

Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in 
Between.
if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
	+	__atomic_thread_fence(__ATOMIC_RELEASE);
		pipeline_event_tx(dev, port, &ev);
		w->processed_pkts++;
	} else {
		ev.queue_id++;
	+	__atomic_thread_fence(__ATOMIC_RELEASE);
		pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
		pipeline_event_enqueue(dev, port, &ev);

However, there are two reasons to prevent this:

First, compare with other tests in app/eventdev, for example, the
eventdev perf test,
the wmb is after event operation to ensure operation has been finished
and then w->processed_pkts++.
In case of perf_* tests start with a dequeue() and finally ends with a mempool_put()
should also act as implicit acquire release pairs making stats consistent?
So, if we move release barriers before tx/fwd, it may cause that the
tests of app/eventdev
become  inconsistent.This may reduce the maintainability of the code
and make it difficult to understand.

Second, it is a test case, though heavy thread may cause performance
degradation, it can ensure that
the operation process and the test result are correct. And maybe for a
test case, correctness is more important
than performance.
Most of our internal perf test run on 24/48 core combinations and since 
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.
So, due to two reasons above, I'm ambivalent about how we should do
in the next step.

Best Regards
Feifei
Regards,
Pavan.
quoted
quoted
Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
functions")
Cc: pbhagavatula@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <redacted>
Signed-off-by: Feifei Wang <redacted>
Reviewed-by: Ruifeng Wang <redacted>
---
app/test-eventdev/test_pipeline_queue.c | 64
+++++++++++++++++++++----
1 file changed, 56 insertions(+), 8 deletions(-)
diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
100644
quoted
quoted
--- a/app/test-eventdev/test_pipeline_queue.c
+++ b/app/test-eventdev/test_pipeline_queue.c
@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
*arg)

		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
		} else {
			ev.queue_id++;
			pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
@@ -59,7 +65,13 @@
pipeline_queue_worker_single_stage_fwd(void
quoted
quoted
*arg)
		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
		pipeline_event_enqueue(dev, port, &ev);
-		w->processed_pkts++;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), 1,
+				__ATOMIC_RELEASE);
	}

	return 0;
@@ -84,7 +96,13 @@
pipeline_queue_worker_single_stage_burst_tx(void *arg)
			if (ev[i].sched_type ==
RTE_SCHED_TYPE_ATOMIC) {
				pipeline_event_tx(dev, port, &ev[i]);
				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-
quoted
processed_pkts), 1,
+						__ATOMIC_RELEASE);
			} else {
				ev[i].queue_id++;
				pipeline_fwd_event(&ev[i],
@@ -121,7 +139,13 @@
pipeline_queue_worker_single_stage_burst_fwd(void *arg)
		}

		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
-		w->processed_pkts += nb_rx;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
+				__ATOMIC_RELEASE);
	}

	return 0;
@@ -146,7 +170,13 @@
pipeline_queue_worker_multi_stage_tx(void
quoted
quoted
*arg)

		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
			continue;
		}
@@ -180,7 +210,13 @@
pipeline_queue_worker_multi_stage_fwd(void *arg)
			ev.queue_id = tx_queue[ev.mbuf->port];
			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
			pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
		} else {
			ev.queue_id++;
			pipeline_fwd_event(&ev,
sched_type_list[cq_id]);
@@ -214,7 +250,13 @@
pipeline_queue_worker_multi_stage_burst_tx(void *arg)
			if (ev[i].queue_id == tx_queue[ev[i].mbuf-
quoted
port]) {
				pipeline_event_tx(dev, port, &ev[i]);
				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-
quoted
processed_pkts), 1,
+						__ATOMIC_RELEASE);
				continue;
			}
@@ -254,7 +296,13 @@
pipeline_queue_worker_multi_stage_burst_fwd(void *arg)

rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
				pipeline_fwd_event(&ev[i],

RTE_SCHED_TYPE_ATOMIC);
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-
quoted
processed_pkts), 1,
+						__ATOMIC_RELEASE);
			} else {
				ev[i].queue_id++;
				pipeline_fwd_event(&ev[i],
--
2.17.1
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help