[dpdk-dev] 回复: [PATCH v2 1/2] net/i40e: improve performance for scalar Tx

From: Feifei Wang <hidden>
Date: 2021-06-30 05:30:54

-----邮件原件-----
发件人: Xing, Beilei [off-list ref]
发送时间: 2021年6月30日 11:43
收件人: Feifei Wang [off-list ref]
抄送: dev@dpdk.org; nd [off-list ref]; Ruifeng Wang
[off-list ref]
主题: RE: [PATCH v2 1/2] net/i40e: improve performance for scalar Tx

quoted

-----Original Message-----
From: Feifei Wang <redacted>
Sent: Wednesday, June 30, 2021 10:04 AM
To: Xing, Beilei <redacted>
Cc: dev@dpdk.org; nd@arm.com; Feifei Wang <redacted>;
Ruifeng Wang [off-list ref]
Subject: [PATCH v2 1/2] net/i40e: improve performance for scalar Tx

For i40e scalar Tx path, if implement FAST_FREE_MBUF mode, it means
per- queue all mbufs come from the same mempool and have refcnt = 1.

Thus we can use bulk free of the buffers when mbuf fast free mode is
enabled.

Following are the test results with this patch:

MRR L3FWD Test:
two ports & bi-directional flows & one core RX API:
i40e_recv_pkts_bulk_alloc TX API: i40e_xmit_pkts_simple
ring_descs_size = 1024; Ring_I40E_TX_MAX_FREE_SZ = 64; tx_rs_thresh =
I40E_DEFAULT_TX_RSBIT_THRESH = 32; tx_free_thresh =
I40E_DEFAULT_TX_FREE_THRESH = 32;

For scalar path in arm platform with default 'tx_rs_thresh':
In n1sdp, performance is improved by 7.9%; In thunderx2, performance
is improved by 7.6%.

For scalar path in x86 platform with default 'tx_rs_thresh':
performance is improved by 4.7%.

Suggested-by: Ruifeng Wang <redacted>
Signed-off-by: Feifei Wang <redacted>
Reviewed-by: Ruifeng Wang <redacted>
---
 drivers/net/i40e/i40e_rxtx.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/net/i40e/i40e_rxtx.c

b/drivers/net/i40e/i40e_rxtx.c index 6c58decece..8c72391cde 100644

--- a/drivers/net/i40e/i40e_rxtx.c
+++ b/drivers/net/i40e/i40e_rxtx.c

@@ -1294,7 +1294,11 @@ static __rte_always_inline int

i40e_tx_free_bufs(struct i40e_tx_queue *txq)  {
 	struct i40e_tx_entry *txep;
-	uint16_t i;
+	int n = txq->tx_rs_thresh;

Thanks for the patch, just little comment, can we use 'tx_rs_thresh' to
replace 'n' to make it more readable?

Good comments for this, I will update it, thanks.

quoted

+	uint16_t i = 0, j = 0;
+	struct rte_mbuf *free[RTE_I40E_TX_MAX_FREE_BUF_SZ];
+	const int32_t k = RTE_ALIGN_FLOOR(n,
RTE_I40E_TX_MAX_FREE_BUF_SZ);
+	const int32_t m = n % RTE_I40E_TX_MAX_FREE_BUF_SZ;

 	if ((txq->tx_ring[txq->tx_next_dd].cmd_type_offset_bsz &

	rte_cpu_to_le_64(I40E_TXD_QW1_DTYPE_MASK)) != @@ -1307,9
+1311,23

quoted

@@ i40e_tx_free_bufs(struct i40e_tx_queue *txq)
 		rte_prefetch0((txep + i)->mbuf);

 	if (txq->offloads & DEV_TX_OFFLOAD_MBUF_FAST_FREE) {
-		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
-			rte_mempool_put(txep->mbuf->pool, txep->mbuf);
-			txep->mbuf = NULL;
+		if (k) {
+			for (j = 0; j != k; j +=

RTE_I40E_TX_MAX_FREE_BUF_SZ)

quoted

{
+				for (i = 0; i <

RTE_I40E_TX_MAX_FREE_BUF_SZ;

quoted

++i, ++txep) {
+					free[i] = txep->mbuf;
+					txep->mbuf = NULL;
+				}
+				rte_mempool_put_bulk(free[0]->pool, (void
**)free,
+
	RTE_I40E_TX_MAX_FREE_BUF_SZ);
+			}
+		}
+
+		if (m) {
+			for (i = 0; i < m; ++i, ++txep) {
+				free[i] = txep->mbuf;
+				txep->mbuf = NULL;
+			}
+			rte_mempool_put_bulk(free[0]->pool, (void **)free,
m);
 		}
 	} else {
 		for (i = 0; i < txq->tx_rs_thresh; ++i, ++txep) {
--
2.25.1

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help