Re: [dpdk-dev] [PATCH 2/2] net/mlx5: reduce unnecessary memory access
From: Slava Ovsiienko <hidden>
Date: 2021-07-02 07:06:01
Hi, Ruifeng
Could we go further and implement loop inside the conditional?
Like this:
if (mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh) > 1) {
for (i = 0; i < n; ++i) {
void *buf_addr = elts[i]->buf_addr;
wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
RTE_PKTMBUF_HEADROOM);
wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]);
}
} else {
for (i = 0; i < n; ++i) {
void *buf_addr = elts[i]->buf_addr;
wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr +
RTE_PKTMBUF_HEADROOM);
}
}
What do you think?
Also, we should check the performance on other archs is not affected.
With best regards,
Slava
quoted hunk ↗ jump to hunk
-----Original Message----- From: Ruifeng Wang <redacted> Sent: Tuesday, June 1, 2021 11:31 To: Raslan Darawsheh <redacted>; Matan Azrad [off-list ref]; Shahaf Shuler [off-list ref]; Slava Ovsiienko [off-list ref] Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com; honnappa.nagarahalli@arm.com; Ruifeng Wang [off-list ref] Subject: [PATCH 2/2] net/mlx5: reduce unnecessary memory access MR btree len is a constant during Rx replenish. Moved retrieve of the value out of loop to reduce data loads. Slight performance uplift was measured on N1SDP. Signed-off-by: Ruifeng Wang <redacted> --- drivers/net/mlx5/mlx5_rxtx_vec.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)diff --git a/drivers/net/mlx5/mlx5_rxtx_vec.cb/drivers/net/mlx5/mlx5_rxtx_vec.c index d5af2d91ff..fc7e2a7f41 100644--- a/drivers/net/mlx5/mlx5_rxtx_vec.c +++ b/drivers/net/mlx5/mlx5_rxtx_vec.c@@ -95,6 +95,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data*rxq) volatile struct mlx5_wqe_data_seg *wq = &((volatile struct mlx5_wqe_data_seg *)rxq->wqes)[elts_idx]; unsigned int i; + uint16_t btree_len; if (n >= rxq->rq_repl_thresh) { MLX5_ASSERT(n >= MLX5_VPMD_RXQ_RPLNSH_THRESH(q_n));@@ -106,6 +107,8 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data*rxq) rxq->stats.rx_nombuf += n; return; } + + btree_len = mlx5_mr_btree_len(&rxq->mr_ctrl.cache_bh); for (i = 0; i < n; ++i) { void *buf_addr;@@ -119,8 +122,7 @@ mlx5_rx_replenish_bulk_mbuf(struct mlx5_rxq_data*rxq) wq[i].addr = rte_cpu_to_be_64((uintptr_t)buf_addr + RTE_PKTMBUF_HEADROOM); /* If there's a single MR, no need to replace LKey. */ - if (unlikely(mlx5_mr_btree_len(&rxq-quoted
mr_ctrl.cache_bh)- > 1)) + if (unlikely(btree_len > 1)) wq[i].lkey = mlx5_rx_mb2mr(rxq, elts[i]); } rxq->rq_ci += n; -- 2.25.1