[dpdk-dev] 回复: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline test
From: Feifei Wang <hidden>
Date: 2021-01-08 07:13:09
Hi, Pavan
-----邮件原件-----
发件人: Pavan Nikhilesh Bhagavatula [off-list ref]
发送时间: 2021年1月5日 17:29
收件人: Feifei Wang [off-list ref]; jerinj@marvell.com; Harry
van Haaren [off-list ref]
抄送: dev@dpdk.org; nd [off-list ref]; Honnappa Nagarahalli
[off-list ref]; stable@dpdk.org; Ruifeng Wang
[off-list ref]; nd [off-list ref]
主题: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers for pipeline
test
Hi Feifei,
quoted
Hi, Pavan
quoted
quoted
Sorry for my late reply and thanks very much for your review.
quoted
quoted
quoted
-----Original Message-----
quoted
quoted
From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>>
quoted
quoted
Sent: 2020年12月22日 18:33
quoted
quoted
To: Feifei Wang <Feifei.Wang2@arm.com<mailto:Feifei.Wang2@arm.com>>; jerinj@marvell.com<mailto:jerinj@marvell.com>;
quoted
Harry van
quoted
quoted
Haaren <harry.van.haaren@intel.com<mailto:harry.van.haaren@intel.com>>; Pavan Nikhilesh
quoted
quoted
<pbhagavatula@caviumnetworks.com<mailto:pbhagavatula@caviumnetworks.com>>
quoted
quoted
Cc: dev@dpdk.org<mailto:dev@dpdk.org>; nd <nd@arm.com<mailto:nd@arm.com>>; Honnappa Nagarahalli
quoted
quoted
<Honnappa.Nagarahalli@arm.com<mailto:Honnappa.Nagarahalli@arm.com>>; stable@dpdk.org<mailto:stable@dpdk.org>; Phil Yang
quoted
quoted
<Phil.Yang@arm.com<mailto:Phil.Yang@arm.com>>
quoted
quoted
Subject: RE: [RFC PATCH v1 4/6] app/eventdev: add release barriers
quoted
for
quoted
quoted
pipeline test
quoted
quoted
quoted
quoted
quoted
quoted
quoted
Add release barriers before updating the processed packets for
quoted
worker
quoted
quoted
quoted
lcores to ensure the worker lcore has really finished data
quoted
quoted
quoted
processing and then it can update the processed packets number.
quoted
quoted
quoted
quoted
quoted
quoted
quoted
I believe we can live with minor inaccuracies in stats being
quoted
quoted
presented
quoted
as
quoted
quoted
atomics are pretty heavy when scheduler is limited to burst size as 1.
quoted
quoted
quoted
quoted
One option is to move it before a pipeline operation
quoted
(pipeline_event_tx,
quoted
quoted
pipeline_fwd_event etc.) as they imply implicit release barrier (as
quoted
quoted
all
quoted
the
quoted
quoted
changes done to the event should be visible to the next core).
quoted
quoted
If I understand correctly, your meaning is that move release barriers
quoted
before pipeline_event_tx or pipeline_fwd_event. This can ensure the
quoted
event has been processed before the next core begins to tx/fwd. For
quoted
example:
What I meant was event APIs such as `rte_event_enqueue_burst`,
`rte_event_eth_tx_adapter_enqueue`
act as an implicit release barrier and the API `rte_event_dequeue_burst` act
as an implicit acquire barrier.
Since, pipeline_* test starts with a dequeue() and ends with an enqueue() I
don’t believe we need barriers in Between.
Sorry for my misunderstanding this. And I agree with you that no barriers are
needed between dequeue and enqueue.
Now, let's go back to the beginning. Actually with this patch, our barrier is mainly
for the synchronous variable " w->processed_pkts ". As we all know, the event is firstly
dequeued and then enqueued, after this, the event can be treated as the processed event
and included in the statistics("w->processed_pkts++").
Thus, we add a release barrier before " w->processed_pkts++" is to prevent this operation
being executed ahead of time. For example:
dequeue -> w->processed_pkts++ -> enqueue
This cause that the worker doesn't actually finish this event processing, but the event is treated
as the processed one and included in the statistics.
______________________________________________________________________________
By the way, I have two other questions about pipeline process test in "test_pipeline_queue".
1. when do we start counting processed events (w->processed_pkts)?
For the fwd mode (internal_port = false), when we choose single stage, application increments
the number events processed after "pipeline_event enqueue". However, when we choose multiple
stage, application increments the number events processed before "pipleline_event_enqueue". So,
maybe we can unify this. For example of multiple stage:
if (cq_id == last_queue) {
ev.queue_id = tx_queue[ev.mbuf->port];
rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
+ pipeline_event_enqueue(dev, port, &ev);
w->processed_pkts++;
} else {
ev.queue_id++;
pipeline_fwd_event(&ev, sched_type_list[cq_id]);
+ pipeline_event_enqueue(dev, port, &ev);
}
- pipeline_event_enqueue(dev, port, &ev);
2. Whether "pipeline_event_enqueue" is needed after "pipeline_event_tx" for tx mode?
For single_stage_burst_tx mode, after "pipeline_event_tx", the worker has to enqueue again
due to "pipeline_event_enqueue_burst", so maybe we should jump out of the loop after
“pipeline_event_tx”, for example:
if (ev[i].sched_type == RTE_SCHED_TYPE_ATOMIC) {
pipeline_event_tx(dev, port, &ev[i]);
ev[i].op = RTE_EVENT_OP_RELEASE;
w->processed_pkts++;
+ continue;
} else {
ev[i].queue_id++;
pipeline_fwd_event(&ev[i],
RTE_SCHED_TYPE_ATOMIC);
}
}
pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
quoted
quoted
if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
quoted
+ __atomic_thread_fence(__ATOMIC_RELEASE);
quoted
pipeline_event_tx(dev, port, &ev);
quoted
w->processed_pkts++;
quoted
} else {
quoted
ev.queue_id++;
quoted
+ __atomic_thread_fence(__ATOMIC_RELEASE);
quoted
pipeline_fwd_event(&ev,
quoted
RTE_SCHED_TYPE_ATOMIC);
quoted
pipeline_event_enqueue(dev, port, &ev);
quoted
quoted
However, there are two reasons to prevent this:
quoted
quoted
First, compare with other tests in app/eventdev, for example, the
quoted
eventdev perf test, the wmb is after event operation to ensure
quoted
operation has been finished and then w->processed_pkts++.
In case of perf_* tests start with a dequeue() and finally ends with a
mempool_put() should also act as implicit acquire release pairs making stats
consistent?
For perf tests, this consistency refers to that there is a wmb after mempool_put(). Please refer to this link: http://patches.dpdk.org/patch/85634/
quoted
So, if we move release barriers before tx/fwd, it may cause that the
quoted
tests of app/eventdev become inconsistent.This may reduce the
quoted
maintainability of the code and make it difficult to understand.
quoted
quoted
Second, it is a test case, though heavy thread may cause performance
quoted
degradation, it can ensure that the operation process and the test
quoted
result are correct. And maybe for a test case, correctness is more
quoted
important than performance.
quoted
Most of our internal perf test run on 24/48 core combinations and since
Octeontx2 event device driver supports a burst size of 1, it will show up as
Huge performance degradation.
For the impact on performance, I do the test using software driver, following are some test results: ------------------------------------------------------------------------------------------------------------------------------------ Architecture: aarch64 Nics: ixgbe-82599 CPU: Cortex-A72 BURST_SIZE: 1 Order: ./dpdk-test-eventdev -l 0-15 -s 0x2 --vdev=event_sw0 -- --test=pipeline_queue --wlcore=4-14 --prod_type_ethdev --stlist=a,a Flow: one flow, 64bits package, TX rate: 1.4Mpps Without this patch: 0.954 mpps avg 0.953 mpps With this patch: 0.932 mpps avg 0.930 mpps ------------------------------------------------------------------------------------------------------------------------------------ Based on the result above, there is no significant performance degradation with this patch. This is because the release barrier is only for “w->processed_pkts++”. It just ensures that the worker core increments the number events processed after enqueue, and it doesn’t affect dequeue/enqueue: dequeue -> enqueue -> release barrier -> w->processed_pkts++ On the other hand, I infer the reason for the slight decrease in measurement performance is that the release barrier prevent “w->processed_pkts++” before that the event has been processed (enqueue). But I think this test result is closer to the real performance. And sorry for that we have no octentx2 device, so there is no test result on Octeontx2 event device driver. Would you please help us test this patch on octentx2 when you are convenient. Thanks very much. Best Regards Feifei
quoted
So, due to two reasons above, I'm ambivalent about how we should do in
quoted
the next step.
quoted
quoted
Best Regards
quoted
Feifei
Regards,
Pavan.
quoted
quoted
quoted
quoted
Fixes: 314bcf58ca8f ("app/eventdev: add pipeline queue worker
quoted
quoted
quoted
functions")
quoted
quoted
quoted
Cc: pbhagavatula@marvell.com<mailto:pbhagavatula@marvell.com>
quoted
quoted
quoted
Cc: stable@dpdk.org<mailto:stable@dpdk.org>
quoted
quoted
quoted
quoted
quoted
quoted
Signed-off-by: Phil Yang <phil.yang@arm.com<mailto:phil.yang@arm.com>>
quoted
quoted
quoted
Signed-off-by: Feifei Wang <feifei.wang2@arm.com<mailto:feifei.wang2@arm.com>>
quoted
quoted
quoted
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com<mailto:ruifeng.wang@arm.com>>
quoted
quoted
quoted
---
quoted
quoted
quoted
app/test-eventdev/test_pipeline_queue.c | 64
quoted
quoted
quoted
+++++++++++++++++++++----
quoted
quoted
quoted
1 file changed, 56 insertions(+), 8 deletions(-)
quoted
quoted
quoted
quoted
quoted
quoted
diff --git a/app/test-eventdev/test_pipeline_queue.c b/app/test-
quoted
quoted
quoted
eventdev/test_pipeline_queue.c index 7bebac34f..0c0ec0ceb
quoted
100644
quoted
quoted
quoted
--- a/app/test-eventdev/test_pipeline_queue.c
quoted
quoted
quoted
+++ b/app/test-eventdev/test_pipeline_queue.c
quoted
quoted
quoted
@@ -30,7 +30,13 @@ pipeline_queue_worker_single_stage_tx(void
quoted
quoted
quoted
*arg)
quoted
quoted
quoted
quoted
quoted
quoted
if (ev.sched_type == RTE_SCHED_TYPE_ATOMIC) {
quoted
quoted
quoted
pipeline_event_tx(dev, port, &ev);
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored operation
quoted
quoted
quoted
+ * of the event completes before the number of
quoted
quoted
quoted
+ * processed pkts is visible to the main core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w->processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
} else {
quoted
quoted
quoted
ev.queue_id++;
quoted
quoted
quoted
pipeline_fwd_event(&ev,
quoted
quoted
quoted
RTE_SCHED_TYPE_ATOMIC);
quoted
quoted
quoted
@@ -59,7 +65,13 @@
quoted
pipeline_queue_worker_single_stage_fwd(void
quoted
quoted
quoted
*arg)
quoted
quoted
quoted
rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
quoted
quoted
quoted
pipeline_fwd_event(&ev, RTE_SCHED_TYPE_ATOMIC);
quoted
quoted
quoted
pipeline_event_enqueue(dev, port, &ev);
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored operation
quoted
quoted
quoted
+ * of the event completes before the number of
quoted
quoted
quoted
+ * processed pkts is visible to the main core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w->processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
}
quoted
quoted
quoted
quoted
quoted
quoted
return 0;
quoted
quoted
quoted
@@ -84,7 +96,13 @@
quoted
quoted
quoted
pipeline_queue_worker_single_stage_burst_tx(void *arg)
quoted
quoted
quoted
if (ev[i].sched_type ==
quoted
quoted
quoted
RTE_SCHED_TYPE_ATOMIC) {
quoted
quoted
quoted
pipeline_event_tx(dev, port, &ev[i]);
quoted
quoted
quoted
ev[i].op = RTE_EVENT_OP_RELEASE;
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored
quoted
quoted
quoted
operation
quoted
quoted
quoted
+ * of the event completes before the
quoted
quoted
quoted
number of
quoted
quoted
quoted
+ * processed pkts is visible to the main
quoted
quoted
quoted
core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w-
quoted
quoted
quoted
quoted
processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
} else {
quoted
quoted
quoted
ev[i].queue_id++;
quoted
quoted
quoted
pipeline_fwd_event(&ev[i],
quoted
quoted
quoted
@@ -121,7 +139,13 @@
quoted
quoted
quoted
pipeline_queue_worker_single_stage_burst_fwd(void *arg)
quoted
quoted
quoted
}
quoted
quoted
quoted
quoted
quoted
quoted
pipeline_event_enqueue_burst(dev, port, ev, nb_rx);
quoted
quoted
quoted
- w->processed_pkts += nb_rx;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored operation
quoted
quoted
quoted
+ * of the event completes before the number of
quoted
quoted
quoted
+ * processed pkts is visible to the main core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w->processed_pkts), nb_rx,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
}
quoted
quoted
quoted
quoted
quoted
quoted
return 0;
quoted
quoted
quoted
@@ -146,7 +170,13 @@
quoted
pipeline_queue_worker_multi_stage_tx(void
quoted
quoted
quoted
*arg)
quoted
quoted
quoted
quoted
quoted
quoted
if (ev.queue_id == tx_queue[ev.mbuf->port]) {
quoted
quoted
quoted
pipeline_event_tx(dev, port, &ev);
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored operation
quoted
quoted
quoted
+ * of the event completes before the number of
quoted
quoted
quoted
+ * processed pkts is visible to the main core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w->processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
continue;
quoted
quoted
quoted
}
quoted
quoted
quoted
quoted
quoted
quoted
@@ -180,7 +210,13 @@
quoted
quoted
quoted
pipeline_queue_worker_multi_stage_fwd(void *arg)
quoted
quoted
quoted
ev.queue_id = tx_queue[ev.mbuf->port];
quoted
quoted
quoted
rte_event_eth_tx_adapter_txq_set(ev.mbuf, 0);
quoted
quoted
quoted
pipeline_fwd_event(&ev,
quoted
quoted
quoted
RTE_SCHED_TYPE_ATOMIC);
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored operation
quoted
quoted
quoted
+ * of the event completes before the number of
quoted
quoted
quoted
+ * processed pkts is visible to the main core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w->processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
} else {
quoted
quoted
quoted
ev.queue_id++;
quoted
quoted
quoted
pipeline_fwd_event(&ev,
quoted
quoted
quoted
sched_type_list[cq_id]);
quoted
quoted
quoted
@@ -214,7 +250,13 @@
quoted
quoted
quoted
pipeline_queue_worker_multi_stage_burst_tx(void *arg)
quoted
quoted
quoted
if (ev[i].queue_id == tx_queue[ev[i].mbuf-
quoted
quoted
quoted
quoted
port]) {
quoted
quoted
quoted
pipeline_event_tx(dev, port, &ev[i]);
quoted
quoted
quoted
ev[i].op = RTE_EVENT_OP_RELEASE;
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored
quoted
quoted
quoted
operation
quoted
quoted
quoted
+ * of the event completes before the
quoted
quoted
quoted
number of
quoted
quoted
quoted
+ * processed pkts is visible to the main
quoted
quoted
quoted
core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w-
quoted
quoted
quoted
quoted
processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
continue;
quoted
quoted
quoted
}
quoted
quoted
quoted
quoted
quoted
quoted
@@ -254,7 +296,13 @@
quoted
quoted
quoted
pipeline_queue_worker_multi_stage_burst_fwd(void *arg)
quoted
quoted
quoted
quoted
quoted
quoted
rte_event_eth_tx_adapter_txq_set(ev[i].mbuf, 0);
quoted
quoted
quoted
pipeline_fwd_event(&ev[i],
quoted
quoted
quoted
quoted
quoted
quoted
RTE_SCHED_TYPE_ATOMIC);
quoted
quoted
quoted
- w->processed_pkts++;
quoted
quoted
quoted
+
quoted
quoted
quoted
+ /* release barrier here ensures stored
quoted
quoted
quoted
operation
quoted
quoted
quoted
+ * of the event completes before the
quoted
quoted
quoted
number of
quoted
quoted
quoted
+ * processed pkts is visible to the main
quoted
quoted
quoted
core
quoted
quoted
quoted
+ */
quoted
quoted
quoted
+ __atomic_fetch_add(&(w-
quoted
quoted
quoted
quoted
processed_pkts), 1,
quoted
quoted
quoted
+ __ATOMIC_RELEASE);
quoted
quoted
quoted
} else {
quoted
quoted
quoted
ev[i].queue_id++;
quoted
quoted
quoted
pipeline_fwd_event(&ev[i],
quoted
quoted
quoted
--
quoted
quoted
quoted
2.17.1