Thread (11 messages) 11 messages, 4 authors, 2022-01-05

Re: [PATCH bpf-next v2 3/4] ice: xsk: improve AF_XDP ZC Tx and use batching API

From: Alexander Lobakin <hidden>
Date: 2021-12-29 13:13:20
Also in: netdev

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Thu, 16 Dec 2021 14:59:57 +0100
Follow mostly the logic from commit 9610bd988df9 ("ice: optimize XDP_TX
workloads") that has been done in order to address the massive tx_busy
statistic bump and improve the performance as well.

Increase the ICE_TX_THRESH to 64 as it seems to work out better for both
XDP and AF_XDP. Also, separating the stats structs onto separate cache
lines seemed to improve the performance. Batching approach is inspired
by i40e's implementation with adjustments to the cleaning logic.

One difference from 'xdpdrv' XDP_TX is when ring has less than
ICE_TX_THRESH free entries, the cleaning routine will not stop after
cleaning a single ICE_TX_THRESH amount of descs but rather will forward
the next_dd pointer and check the DD bit and for this bit being set the
cleaning will be repeated. IOW clean until there are descs that can be
cleaned.

It takes three separate xdpsock instances in txonly mode to achieve the
line rate and this was not previously possible.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h |   4 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 249 ++++++++++++++--------
 drivers/net/ethernet/intel/ice/ice_xsk.h  |  26 ++-
 4 files changed, 182 insertions(+), 99 deletions(-)
-- 8< --
quoted hunk ↗ jump to hunk
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index 4c7bd8e9dfc4..f2eb99063c1f 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -6,19 +6,36 @@
 #include "ice_txrx.h"
 #include "ice.h"
 
+#define PKTS_PER_BATCH 8
+
+#ifdef __clang__
+#define loop_unrolled_for _Pragma("clang loop unroll_count(8)") for
+#elif __GNUC__ >= 4
+#define loop_unrolled_for _Pragma("GCC unroll 8") for
+#else
+#define loop_unrolled_for for
+#endif
It's used in a bunch more places across the tree, what about
defining that in linux/compiler{,_clang,_gcc}.h?
Is it possible to pass '8' as an argument? Like

	loop_unrolled_for(PKTS_PER_BATCH) ( ; ; ) { }

Could be quite handy.
If it is not, I'd maybe try to define a couple of precoded macros
for 8, 16 and 32, like

#define loop_unrolled_for_8 ...
#define loop_unrolled_for_16 ...
...

So they could be used as generic. I don't think I've seen them with
values other than 8-32.
quoted hunk ↗ jump to hunk
+
 struct ice_vsi;
 
 #ifdef CONFIG_XDP_SOCKETS
 int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
 		       u16 qid);
 int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget);
-bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
 bool ice_alloc_rx_bufs_zc(struct ice_rx_ring *rx_ring, u16 count);
 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
 void ice_xsk_clean_rx_ring(struct ice_rx_ring *rx_ring);
 void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
+bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, u32 budget);
 #else
+static inline bool
+ice_xmit_zc(struct ice_tx_ring __always_unused *xdp_ring,
+	    u32 __always_unused budget)
+{
+	return false;
+}
+
 static inline int
 ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
 		   struct xsk_buff_pool __always_unused *pool,
@@ -34,13 +51,6 @@ ice_clean_rx_irq_zc(struct ice_rx_ring __always_unused *rx_ring,
 	return 0;
 }
 
-static inline bool
-ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
-		    int __always_unused budget)
-{
-	return false;
-}
-
 static inline bool
 ice_alloc_rx_bufs_zc(struct ice_rx_ring __always_unused *rx_ring,
 		     u16 __always_unused count)
-- 
2.33.1
Thanks,
Al
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help