Thread (16 messages) 16 messages, 3 authors, 2021-07-13

Re: [dpdk-dev] [PATCH 1/2] net/mlx5: remove redundant operations

From: Slava Ovsiienko <hidden>
Date: 2021-07-02 08:12:42

Hi, Ruifeng
quoted hunk ↗ jump to hunk
-----Original Message-----
From: Ruifeng Wang <redacted>
Sent: Tuesday, June 1, 2021 11:31
To: Raslan Darawsheh <redacted>; Matan Azrad
[off-list ref]; Shahaf Shuler [off-list ref]; Slava Ovsiienko
[off-list ref]
Cc: dev@dpdk.org; jerinj@marvell.com; nd@arm.com;
honnappa.nagarahalli@arm.com; Ruifeng Wang [off-list ref];
stable@dpdk.org
Subject: [PATCH 1/2] net/mlx5: remove redundant operations

Some operations on mask are redundant and can be removed.
The change yielded 1.6% performance gain on N1SDP.
On ThunderX2, slight performance uplift was also observed.

Fixes: 570acdb1da8a ("net/mlx5: add vectorized Rx/Tx burst for ARM")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <redacted>
---
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 2234fbe6b2..98a75b09c6 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -768,18 +768,11 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq,
volatile struct mlx5_cqe *cq,
 					  comp_mask), 0)) /
 					  (sizeof(uint16_t) * 8);
 		/* D.6 mask out entries after the compressed CQE. */
-		mask = vcreate_u16(comp_idx <
MLX5_VPMD_DESCS_PER_LOOP ?
-				   -1UL >> (comp_idx * sizeof(uint16_t) * 8) :
-				   0);
-		invalid_mask = vorr_u16(invalid_mask, mask);
+		invalid_mask = vorr_u16(invalid_mask, comp_mask);
Mmmm... I'm not sure we can drop the masking compressed (and following) CQE skip.
Let's consider the completion scenario (the series of 4 CQEs, each element is 64B long)

0: normal uncompressed CQE, ownership OK, format uncompressed, opcode OK, no error
1: compressed CQE, ownership OK, format compressed, opcode OK, no error
2: miniCQE array, format can be any!!, may be discovered as ownership OK, format uncompressed, opcode OK, no error
3: miniCQE array, format can be any!!, may be discovered as ownership OK, format uncompressed, opcode OK, no error

Obviously, we should unconditionally mask out 2 and 3, regardless of recognized their formats/opcode/error/etc.
I think we can get the diff above and skip diff below:
 		/* D.7 count non-compressed valid CQEs. */
 		n = __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16(
 				   invalid_mask), 0)) / (sizeof(uint16_t) * 8);
 		nocmp_n += n;
-		/* D.2 get the final invalid mask. */
-		mask = vcreate_u16(n < MLX5_VPMD_DESCS_PER_LOOP ?
-				   -1UL >> (n * sizeof(uint16_t) * 8) : 0);
-		invalid_mask = vorr_u16(invalid_mask, mask);
and get the correct final invalid_mask - all compressed and invalid CQEs and following ones will be masked out.

With best regards,
Slava
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help