Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for... | dpdk-dev

Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test

From: Pavan Nikhilesh Bhagavatula <hidden>
Date: 2021-01-05 09:29:41

Hi Feifei,

Hi, Pavan

Sorry for my late reply and thanks very much for your review.

quoted

-----Original Message-----
From: Pavan Nikhilesh Bhagavatula <redacted>
Sent: 2020年12月22日 18:33
To: Feifei Wang <redacted>; jerinj@marvell.com;

Harry van

quoted

Haaren [off-list ref]; Pavan Nikhilesh
[off-list ref]
Cc: dev@dpdk.org; nd <redacted>; Honnappa Nagarahalli
[off-list ref]; stable@dpdk.org; Phil Yang
[off-list ref]
Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers

for

quoted

pipeline test

quoted

Add release barriers before updating the processed packets for

worker

quoted

lcores to ensure the worker lcore has really finished data processing
and then it can update the processed packets number.

I believe we can live with minor inaccuracies in stats being presented

as

quoted

atomics are pretty heavy when scheduler is limited to burst size as 1.

One option is to move it before a pipeline operation

(pipeline_event_tx,

quoted

pipeline_fwd_event etc.) as they imply implicit release barrier (as all

the

quoted

changes done to the event should be visible to the next core).

If I understand correctly, your meaning is that move release barriers
before
pipeline_event_tx or pipeline_fwd_event. This can ensure the event has
been
processed before the next core begins to tx/fwd. For example:

What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier.

Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in 
Between.

if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
	+	__atomic_thread_fence(__ATOMIC_RELEASE);
		pipeline_event_tx(dev, port, &ev);
		w->processed_pkts++;
	} else {
		ev.queue_id++;
	+	__atomic_thread_fence(__ATOMIC_RELEASE);
		pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
		pipeline_event_enqueue(dev, port, &ev);

However, there are two reasons to prevent this:

First, compare with other tests in app/eventdev, for example, the
eventdev perf test,
the wmb is after event operation to ensure operation has been finished
and then w->processed_pkts++.

In case of perf_* tests start with a dequeue() and finally ends with a mempool_put()
should also act as implicit acquire release pairs making stats consistent?

So, if we move release barriers before tx/fwd, it may cause that the
tests of app/eventdev
become  inconsistent.This may reduce the maintainability of the code
and make it difficult to understand.

Second, it is a test case, though heavy thread may cause performance
degradation, it can ensure that
the operation process and the test result are correct. And maybe for a
test case, correctness is more important
than performance.

Most of our internal perf test run on 24/48 core combinations and since 
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.

So, due to two reasons above, I'm ambivalent about how we should do
in the next step.

Best Regards
Feifei

Regards,
Pavan.

quoted

Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
functions")
Cc: pbhagavatula@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Phil Yang <redacted>
Signed-off-by: Feifei Wang <redacted>
Reviewed-by: Ruifeng Wang <redacted>
---
app/test-eventdev/test_pipeline_queue.c | 64
+++++++++++++++++++++----
1 file changed, 56 insertions(+), 8 deletions(-)

diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-

eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb

quoted

--- a/app/test-eventdev/test_pipeline_queue.c
+++ b/app/test-eventdev/test_pipeline_queue.c

@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void

*arg)

		if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
		} else {
			ev.queue_id++;
			pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);

@@ -59,7 +65,13 @@

pipeline_queue_worker_single_stage_fwd(void

quoted

*arg)
		rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
		pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
		pipeline_event_enqueue(dev, port, &ev);
-		w->processed_pkts++;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), 1,
+				__ATOMIC_RELEASE);
	}

	return 0;

@@ -84,7 +96,13 @@

pipeline_queue_worker_single_stage_burst_tx(void *arg)
			if (ev[i].sched_type ==
RTE_SCHED_TYPE_ATOMIC) {
				pipeline_event_tx(dev, port, &ev[i]);
				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-

quoted

processed_pkts), 1,

+						__ATOMIC_RELEASE);
			} else {
				ev[i].queue_id++;
				pipeline_fwd_event(&ev[i],

@@ -121,7 +139,13 @@

pipeline_queue_worker_single_stage_burst_fwd(void *arg)
		}

		pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
-		w->processed_pkts += nb_rx;
+
+		/* release barrier here ensures stored operation
+		 * of the event completes before the number of
+		 * processed pkts is visible to the main core
+		 */
+		__atomic_fetch_add(&(w->processed_pkts), nb_rx,
+				__ATOMIC_RELEASE);
	}

	return 0;

@@ -146,7 +170,13 @@

pipeline_queue_worker_multi_stage_tx(void

quoted

*arg)

		if (ev.queue_id == tx_queue[ev.mbuf->port]) {
			pipeline_event_tx(dev, port, &ev);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
			continue;
		}

@@ -180,7 +210,13 @@

pipeline_queue_worker_multi_stage_fwd(void *arg)
			ev.queue_id = tx_queue[ev.mbuf->port];
			rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
			pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
-			w->processed_pkts++;
+
+			/* release barrier here ensures stored operation
+			 * of the event completes before the number of
+			 * processed pkts is visible to the main core
+			 */
+			__atomic_fetch_add(&(w->processed_pkts), 1,
+					__ATOMIC_RELEASE);
		} else {
			ev.queue_id++;
			pipeline_fwd_event(&ev,
sched_type_list[cq_id]);

@@ -214,7 +250,13 @@

pipeline_queue_worker_multi_stage_burst_tx(void *arg)
			if (ev[i].queue_id == tx_queue[ev[i].mbuf-

quoted

port]) {

				pipeline_event_tx(dev, port, &ev[i]);
				ev[i].op = RTE_EVENT_OP_RELEASE;
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-

quoted

processed_pkts), 1,

+						__ATOMIC_RELEASE);
				continue;
			}

@@ -254,7 +296,13 @@

pipeline_queue_worker_multi_stage_burst_fwd(void *arg)

rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
				pipeline_fwd_event(&ev[i],

RTE_SCHED_TYPE_ATOMIC);
-				w->processed_pkts++;
+
+				/* release barrier here ensures stored
operation
+				 * of the event completes before the
number of
+				 * processed pkts is visible to the main
core
+				 */
+				__atomic_fetch_add(&(w-

quoted

processed_pkts), 1,

+						__ATOMIC_RELEASE);
			} else {
				ev[i].queue_id++;
				pipeline_fwd_event(&ev[i],
--
2.17.1

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help