Thread (266 messages) 266 messages, 14 authors, 2021-11-05

Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper function

From: Xueming(Steven) Li <hidden>
Date: 2021-08-29 07:07:53

-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Thursday, August 26, 2021 7:28 PM
To: Xueming(Steven) Li <redacted>
Cc: Jack Min <redacted>; dpdk-dev <redacted>; Xiaoyun Li <redacted>
Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper function

On Wed, Aug 18, 2021 at 7:38 PM Xueming(Steven) Li [off-list ref] wrote:
quoted

quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Wednesday, August 18, 2021 7:48 PM
To: Xueming(Steven) Li <redacted>
Cc: Jack Min <redacted>; dpdk-dev <redacted>; Xiaoyun
Li [off-list ref]
Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd
wrapper function

On Wed, Aug 18, 2021 at 4:57 PM Xueming(Steven) Li [off-list ref] wrote:
quoted

quoted
-----Original Message-----
From: Jerin Jacob <redacted>
Sent: Tuesday, August 17, 2021 5:37 PM
To: Xueming(Steven) Li <redacted>
Cc: Jack Min <redacted>; dpdk-dev <redacted>;
Xiaoyun Li [off-list ref]
Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common
fwd wrapper function

On Wed, Aug 11, 2021 at 7:35 PM Xueming Li [off-list ref] wrote:
quoted
From: Xiaoyu Min <redacted>

Added an inline common wrapper function for all fwd engines
which do the following in common:

1. get_start_cycles
2. rte_eth_rx_burst(...,nb_pkt_per_burst)
3. if rxq_share do forward_shared_rxq(), otherwise do fwd directly 4.
get_end_cycle

Signed-off-by: Xiaoyu Min <redacted>
---
 app/test-pmd/testpmd.h | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index
13141dfed9..b685ac48d6 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -1022,6 +1022,30 @@ void add_tx_dynf_callback(portid_t
portid); void remove_tx_dynf_callback(portid_t portid);  int
update_jumbo_frame_offload(portid_t portid);

+static inline void
+do_burst_fwd(struct fwd_stream *fs, packet_fwd_cb fwd) {
+       struct rte_mbuf *pkts_burst[MAX_PKT_BURST];
+       uint16_t nb_rx;
+       uint64_t start_tsc = 0;
+
+       get_start_cycles(&start_tsc);
+
+       /*
+        * Receive a burst of packets and forward them.
+        */
+       nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue,
+                       pkts_burst, nb_pkt_per_burst);
+       inc_rx_burst_stats(fs, nb_rx);
+       if (unlikely(nb_rx == 0))
+               return;
+       if (unlikely(rxq_share > 0))
See below. It reads a global memory.
quoted
+               forward_shared_rxq(fs, nb_rx, pkts_burst, fwd);
+       else
+               (*fwd)(fs, nb_rx, pkts_burst);
New function pointer in fastpath.

IMO, We should not create performance regression for the existing forward engine.
Can we have a new forward engine just for shared memory testing?
Yes, fully aware of the performance concern, the global could be defined around record_core_cycles to minimize the impacts.
Based on test data, the impacts almost invisible in legacy mode.
Are you saying there is zero % regression? If not, could you share the data?
Almost zero, here is a quick single core result of rxonly:
        32.2Mpps, 58.9cycles/packet
Revert the patch to rxonly.c:
        32.1Mpps 59.9cycles/packet
The result doesn't make sense and I realized that I used batch mbuf free, apply it now:
        32.2Mpps, 58.9cycles/packet
There were small digit jumps between testpmd restart, I picked the best one.
The result is almost same, seems the cost of each packet is small enough.
BTW, I'm testing with default burst size and queue depth.
I tested this on octeontx2 with iofwd with single core with 100Gbps Without this patch - 73.5mpps With this patch - 72.8 mpps

We are taking the shared queue runtime option without a separate fwd engine.
and to have zero performance impact and no compile time flag Then I think, only way to have a function template .
Example change to outline function template principle.

static inline
__pkt_burst_io_forward(struct fwd_stream *fs, const u64 flag) {

Introduce new checks under
if (flags & SHARED_QUEUE)


}

Have two versions of io_fwd_engine.packet_fwd per engine.

- first version
static pkt_burst_io_forward(struct fwd_stream *fs) {
        return __pkt_burst_io_forward(fs, 0); }

- Second version
static pkt_burst_io_forward_shared_queue(struct fwd_stream *fs) {
        return __pkt_burst_io_forward(fs, SHARED_QUEUE); }


Update io_fwd_engine.packet_fwd in slowpath to respective version based on offload.

If shared offoad is not selected, pkt_burst_io_forward() will be selected and
__pkt_burst_io_forward() will be a compile time version of !SHARED_QUEUE aka same as existing coe.
Thanks for testing and suggestion. So the only difference here in above code is access to rxq_shared changed
to function parameter, right? Have you tested this performance? If not, I could verify.
quoted
quoted
quoted
From test perspective, better to have all forward engine to verify
shared rxq, test team want to run the regression with less
impacts. Hope to have a solution to utilize all forwarding engines
seamlessly.

Yes. it good goal. testpmd forward performance using as synthetic bench everyone.
I think, we are aligned to not have any regression for the generic forward engine.
quoted
quoted
quoted
+       get_end_cycles(fs, start_tsc); }
+
 /*
  * Work-around of a compilation error with ICC on invocations of the
  * rte_be_to_cpu_16() function.
--
2.25.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help