Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper function
From: Jerin Jacob <hidden>
Date: 2021-08-26 11:28:47
On Wed, Aug 18, 2021 at 7:38 PM Xueming(Steven) Li [off-list ref] wrote:
quoted
-----Original Message----- From: Jerin Jacob <redacted> Sent: Wednesday, August 18, 2021 7:48 PM To: Xueming(Steven) Li <redacted> Cc: Jack Min <redacted>; dpdk-dev <redacted>; Xiaoyun Li <redacted> Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper function On Wed, Aug 18, 2021 at 4:57 PM Xueming(Steven) Li [off-list ref] wrote:quoted
quoted
-----Original Message----- From: Jerin Jacob <redacted> Sent: Tuesday, August 17, 2021 5:37 PM To: Xueming(Steven) Li <redacted> Cc: Jack Min <redacted>; dpdk-dev <redacted>; Xiaoyun Li [off-list ref] Subject: Re: [dpdk-dev] [PATCH v2 06/15] app/testpmd: add common fwd wrapper function On Wed, Aug 11, 2021 at 7:35 PM Xueming Li [off-list ref] wrote:quoted
From: Xiaoyu Min <redacted> Added an inline common wrapper function for all fwd engines which do the following in common: 1. get_start_cycles 2. rte_eth_rx_burst(...,nb_pkt_per_burst) 3. if rxq_share do forward_shared_rxq(), otherwise do fwd directly 4. get_end_cycle Signed-off-by: Xiaoyu Min <redacted> --- app/test-pmd/testpmd.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h index13141dfed9..b685ac48d6 100644--- a/app/test-pmd/testpmd.h +++ b/app/test-pmd/testpmd.h@@ -1022,6 +1022,30 @@ void add_tx_dynf_callback(portid_t portid);void remove_tx_dynf_callback(portid_t portid); int update_jumbo_frame_offload(portid_t portid); +static inline void +do_burst_fwd(struct fwd_stream *fs, packet_fwd_cb fwd) { + struct rte_mbuf *pkts_burst[MAX_PKT_BURST]; + uint16_t nb_rx; + uint64_t start_tsc = 0; + + get_start_cycles(&start_tsc); + + /* + * Receive a burst of packets and forward them. + */ + nb_rx = rte_eth_rx_burst(fs->rx_port, fs->rx_queue, + pkts_burst, nb_pkt_per_burst); + inc_rx_burst_stats(fs, nb_rx); + if (unlikely(nb_rx == 0)) + return; + if (unlikely(rxq_share > 0))See below. It reads a global memory.quoted
+ forward_shared_rxq(fs, nb_rx, pkts_burst, fwd); + else + (*fwd)(fs, nb_rx, pkts_burst);New function pointer in fastpath. IMO, We should not create performance regression for the existing forward engine. Can we have a new forward engine just for shared memory testing?Yes, fully aware of the performance concern, the global could be defined around record_core_cycles to minimize the impacts. Based on test data, the impacts almost invisible in legacy mode.Are you saying there is zero % regression? If not, could you share the data?Almost zero, here is a quick single core result of rxonly: 32.2Mpps, 58.9cycles/packet Revert the patch to rxonly.c: 32.1Mpps 59.9cycles/packet The result doesn't make sense and I realized that I used batch mbuf free, apply it now: 32.2Mpps, 58.9cycles/packet There were small digit jumps between testpmd restart, I picked the best one. The result is almost same, seems the cost of each packet is small enough. BTW, I'm testing with default burst size and queue depth.
I tested this on octeontx2 with iofwd with single core with 100Gbps
Without this patch - 73.5mpps
With this patch - 72.8 mpps
We are taking the shared queue runtime option without a separate fwd engine.
and to have zero performance impact and no compile time flag
Then I think, only way to have a function template .
Example change to outline function template principle.
static inline
__pkt_burst_io_forward(struct fwd_stream *fs, const u64 flag)
{
Introduce new checks under
if (flags & SHARED_QUEUE)
}
Have two versions of io_fwd_engine.packet_fwd per engine.
- first version
static pkt_burst_io_forward(struct fwd_stream *fs)
{
return __pkt_burst_io_forward(fs, 0);
}
- Second version
static pkt_burst_io_forward_shared_queue(struct fwd_stream *fs)
{
return __pkt_burst_io_forward(fs, SHARED_QUEUE);
}
Update io_fwd_engine.packet_fwd in slowpath to respective version
based on offload.
If shared offoad is not selected, pkt_burst_io_forward() will be selected and
__pkt_burst_io_forward() will be a compile time version of
!SHARED_QUEUE aka same as existing coe.
quoted
quoted
From test perspective, better to have all forward engine to verify shared rxq, test team want to run the regression with less impacts. Hope to have a solution to utilize all forwarding enginesseamlessly. Yes. it good goal. testpmd forward performance using as synthetic bench everyone. I think, we are aligned to not have any regression for the generic forward engine.quoted
quoted
quoted
+ get_end_cycles(fs, start_tsc); } + /* * Work-around of a compilation error with ICC on invocations of the * rte_be_to_cpu_16() function. -- 2.25.1