Re: [dpdk-dev] [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
From: Pavan Nikhilesh Bhagavatula <hidden>
Date: 2021-01-05 09:29:41
Hi Feifei,
Hi, Pavan Sorry for my late reply and thanks very much for your review.quoted
-----Original Message----- From: Pavan Nikhilesh Bhagavatula <redacted> Sent: 2020年12月22日 18:33 To: Feifei Wang <redacted>; jerinj@marvell.com;Harry vanquoted
Haaren [off-list ref]; Pavan Nikhilesh [off-list ref] Cc: dev@dpdk.org; nd <redacted>; Honnappa Nagarahalli [off-list ref]; stable@dpdk.org; Phil Yang [off-list ref] Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriersforquoted
pipeline testquoted
Add release barriers before updating the processed packets forworkerquoted
quoted
lcores to ensure the worker lcore has really finished data processing and then it can update the processed packets number.I believe we can live with minor inaccuracies in stats being presentedasquoted
atomics are pretty heavy when scheduler is limited to burst size as 1. One option is to move it before a pipeline operation(pipeline_event_tx,quoted
pipeline_fwd_event etc.) as they imply implicit release barrier (as allthequoted
changes done to the event should be visible to the next core).If I understand correctly, your meaning is that move release barriers before pipeline_event_tx or pipeline_fwd_event. This can ensure the event has been processed before the next core begins to tx/fwd. For example:
What I meant was event APIs such as `rte_event_enqueue_burst`, `rte_event_eth_tx_adapter_enqueue` act as an implicit release barrier and the API `rte_event_dequeue_burst` act as an implicit acquire barrier. Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I don’t believe we need barriers in Between.
if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
+ __atomic_thread_fence(__ATOMIC_RELEASE);
pipeline_event_tx(dev, port, &ev);
w->processed_pkts++;
} else {
ev.queue_id++;
+ __atomic_thread_fence(__ATOMIC_RELEASE);
pipeline_fwd_event(&ev,
RTE_SCHED_TYPE_ATOMIC);
pipeline_event_enqueue(dev, port, &ev);
However, there are two reasons to prevent this:
First, compare with other tests in app/eventdev, for example, the
eventdev perf test,
the wmb is after event operation to ensure operation has been finished
and then w->processed_pkts++.In case of perf_* tests start with a dequeue() and finally ends with a mempool_put() should also act as implicit acquire release pairs making stats consistent?
So, if we move release barriers before tx/fwd, it may cause that the tests of app/eventdev become inconsistent.This may reduce the maintainability of the code and make it difficult to understand. Second, it is a test case, though heavy thread may cause performance degradation, it can ensure that the operation process and the test result are correct. And maybe for a test case, correctness is more important than performance.
Most of our internal perf test run on 24/48 core combinations and since Octeontx2 event device driver supports a burst size of 1, it will show up as Huge performance degradation.
So, due to two reasons above, I'm ambivalent about how we should do in the next step. Best Regards Feifei
Regards, Pavan.
quoted
quoted
Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker functions") Cc: pbhagavatula@marvell.com Cc: stable@dpdk.org Signed-off-by: Phil Yang <redacted> Signed-off-by: Feifei Wang <redacted> Reviewed-by: Ruifeng Wang <redacted> --- app/test-eventdev/test_pipeline_queue.c | 64 +++++++++++++++++++++---- 1 file changed, 56 insertions(+), 8 deletions(-)diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb100644quoted
quoted
--- a/app/test-eventdev/test_pipeline_queue.c +++ b/app/test-eventdev/test_pipeline_queue.c@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void*arg) if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) { pipeline_event_tx(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev.queue_id++; pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);@@ -59,7 +65,13 @@pipeline_queue_worker_single_stage_fwd(voidquoted
quoted
*arg) rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); pipeline_event_enqueue(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } return 0;@@ -84,7 +96,13 @@pipeline_queue_worker_single_stage_burst_tx(void *arg) if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) { pipeline_event_tx(dev, port, &ev[i]); ev[i].op = RTE_EVENT_OP_RELEASE; - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w-quoted
processed_pkts), 1,+ __ATOMIC_RELEASE); } else { ev[i].queue_id++; pipeline_fwd_event(&ev[i],@@ -121,7 +139,13 @@pipeline_queue_worker_single_stage_burst_fwd(void *arg) } pipeline_event_enqueue_burst(dev, port, ev, nb_rx); - w->processed_pkts += nb_rx; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), nb_rx, + __ATOMIC_RELEASE); } return 0;@@ -146,7 +170,13 @@pipeline_queue_worker_multi_stage_tx(voidquoted
quoted
*arg) if (ev.queue_id == tx_queue[ev.mbuf->port]) { pipeline_event_tx(dev, port, &ev); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); continue; }@@ -180,7 +210,13 @@pipeline_queue_worker_multi_stage_fwd(void *arg) ev.queue_id = tx_queue[ev.mbuf->port]; rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0); pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w->processed_pkts), 1, + __ATOMIC_RELEASE); } else { ev.queue_id++; pipeline_fwd_event(&ev, sched_type_list[cq_id]);@@ -214,7 +250,13 @@pipeline_queue_worker_multi_stage_burst_tx(void *arg) if (ev[i].queue_id == tx_queue[ev[i].mbuf-quoted
port]) {pipeline_event_tx(dev, port, &ev[i]); ev[i].op = RTE_EVENT_OP_RELEASE; - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w-quoted
processed_pkts), 1,+ __ATOMIC_RELEASE); continue; }@@ -254,7 +296,13 @@pipeline_queue_worker_multi_stage_burst_fwd(void *arg) rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0); pipeline_fwd_event(&ev[i], RTE_SCHED_TYPE_ATOMIC); - w->processed_pkts++; + + /* release barrier here ensures stored operation + * of the event completes before the number of + * processed pkts is visible to the main core + */ + __atomic_fetch_add(&(w-quoted
processed_pkts), 1,+ __ATOMIC_RELEASE); } else { ev[i].queue_id++; pipeline_fwd_event(&ev[i], -- 2.17.1