Thread (16 messages) 16 messages, 3 authors, 2021-07-13

Re: [dpdk-dev] [PATCH 2/2] net/mlx5: reduce unnecessary memory access

From: Slava Ovsiienko <hidden>
Date: 2021-07-02 07:06:01

Hi, Ruifeng

Could we go further and implement loop inside the conditional?
Like this:
if (mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1) {
	for (i = 0; i < n; ++i) {
		void *buf_addr = elts[i]->buf_addr;

		wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
					      RTE_PKTMBUF_HEADROOM);
		wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
	}
} else {
	for (i = 0; i < n; ++i) {
		void *buf_addr = elts[i]->buf_addr;

		wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
					      RTE_PKTMBUF_HEADROOM);
	}
}
What do you think?
Also,  we should check the performance on other archs is not affected.

With best regards,
Slava
quoted hunk ↗ jump to hunk
-----Original Message-----
From: Ruifeng Wang <redacted>
Sent: Tuesday, June 1, 2021 11:31
To: Raslan Darawsheh <redacted>; Matan Azrad
[off-list ref]; Shahaf Shuler [off-list ref]; Slava Ovsiienko
[off-list ref]
Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com;
honnappa.nagarahalli@arm.com; Ruifeng Wang [off-list ref]
Subject: [PATCH 2/2] net/mlx5: reduce unnecessary memory access

MR btree len is a constant during Rx replenish.
Moved retrieve of the value out of loop to reduce data loads.
Slight performance uplift was measured on N1SDP.

Signed-off-by: Ruifeng Wang <redacted>
---
 drivers/net/mlx5/mlx5_rxtx_vec.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.c
b/drivers/net/mlx5/mlx5_rxtx_vec.c
index d5af2d91ff..fc7e2a7f41 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec.c
+++ b/drivers/net/mlx5/mlx5_rxtx_vec.c
@@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
*rxq)
 	volatile struct mlx5_wqe_data_seg *wq =
 		&((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[elts_idx];
 	unsigned int i;
+	uint16_t btree_len;

 	if (n >= rxq->rq_repl_thresh) {
 		MLX5_ASSERT(n >=
MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n));
@@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
*rxq)
 			rxq->stats.rx_nombuf += n;
 			return;
 		}
+
+		btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh);
 		for (i = 0; i < n; ++i) {
 			void *buf_addr;
@@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data
*rxq)
 			wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +

RTE_PKTMBUF_HEADROOM);
 			/* If there's a single MR, no need to replace LKey. */
-			if (unlikely(mlx5_mr_btree_len(&rxq-
quoted
mr_ctrl.cache_bh)
-				     > 1))
+			if (unlikely(btree_len > 1))
 				wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
 		}
 		rxq->rq_ci += n;
--
2.25.1
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help