Thread (54 messages) 54 messages, 11 authors, 2022-12-01

Re: [xdp-hints] [PATCH bpf-next v2 6/8] mlx4: Introduce mlx4_xdp_buff wrapper for xdp_buff

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: 2022-11-23 21:55:04
Also in: bpf

On Wed, Nov 23, 2022 at 11:52:12AM -0800, sdf@google.com wrote:
On 11/23, Jakub Kicinski wrote:
quoted
On Wed, 23 Nov 2022 10:26:41 -0800 Stanislav Fomichev wrote:
quoted
quoted
This embedding trick works for drivers that put xdp_buff on the stack,
but mlx5 supports XSK zerocopy, which uses the xsk_buff_pool for
allocating them. This makes it a bit awkward to do the same thing
there;
quoted
quoted
and since it's probably going to be fairly common to do something like
this, how about we just add a 'void *drv_priv' pointer to struct
xdp_buff that the drivers can use? The xdp_buff already takes up a
full
quoted
quoted
cache line anyway, so any data stuffed after it will spill over to a
new
quoted
quoted
one; so I don't think there's much difference performance-wise.
I guess the alternative is to extend xsk_buff_pool with some new
argument for xdp_buff tailroom? (so it can kmalloc(sizeof(xdp_buff) +
xdp_buff_tailroom))
But it seems messy because there is no way of knowing what the target
device's tailroom is, so it has to be a user setting :-/
I've started with a priv pointer in xdp_buff initially, it seems fine
to go back. I'll probably convert veth/mlx4 to the same mode as well
to avoid having different approaches in different places..
quoted
Can we not do this please? Add 16B of "private driver space" after
the xdp_buff in xdp_buff_xsk (we have 16B to full cacheline), the
It is time to jump the hints train I guess:D

We have 8 bytes left in the cacheline that xdp_buff occupies - pahole
output below shows that cb spans through two cachelines. Did you mean
something else though?
quoted hunk ↗ jump to hunk
quoted
drivers decide how they use it. Drivers can do BUILD_BUG_ON() for their
expected size and cast that to whatever struct they want. This is how
various offloads work, the variable size tailroom would be an over
design IMO.
quoted
And this way non XSK paths can keep its normal typing.
Good idea, prototyped below, lmk if it that's not what you had in mind.

struct xdp_buff_xsk {
	struct xdp_buff            xdp;                  /*     0    56 */
	u8                         cb[16];               /*    56    16 */
	/* --- cacheline 1 boundary (64 bytes) was 8 bytes ago --- */
	dma_addr_t                 dma;                  /*    72     8 */
	dma_addr_t                 frame_dma;            /*    80     8 */
	struct xsk_buff_pool *     pool;                 /*    88     8 */
	u64                        orig_addr;            /*    96     8 */
	struct list_head           free_list_node;       /*   104    16 */

	/* size: 120, cachelines: 2, members: 7 */
	/* last cacheline: 56 bytes */
};

Toke, I can try to merge this into your patch + keep your SoB (or feel free
to try this and retest yourself, whatever works).
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
index bc2d9034af5b..837bf103b871 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.h
@@ -44,6 +44,11 @@
 	(MLX5E_XDP_INLINE_WQE_MAX_DS_CNT * MLX5_SEND_WQE_DS - \
 	 sizeof(struct mlx5_wqe_inline_seg))

+struct mlx5_xdp_cb {
+	struct mlx5_cqe64 *cqe;
+	struct mlx5e_rq *rq;
+};
+
 struct mlx5e_xsk_param;
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param
*xsk);
 bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct page *page,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
index c91b54d9ff27..84d23b2da7ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c
@@ -5,6 +5,7 @@
 #include "en/xdp.h"
 #include <net/xdp_sock_drv.h>
 #include <linux/filter.h>
+#include <linux/build_bug.h>

 /* RX data path */
@@ -286,8 +287,14 @@ struct sk_buff *mlx5e_xsk_skb_from_cqe_linear(struct
mlx5e_rq *rq,
 					      u32 cqe_bcnt)
 {
 	struct xdp_buff *xdp = wi->au->xsk;
+	struct mlx5_xdp_cb *cb;
 	struct bpf_prog *prog;

+	BUILD_BUG_ON(sizeof(struct mlx5_xdp_cb) > XSKB_CB_SIZE);
+	cb = xp_get_cb(xdp);
+	cb->cqe = NULL /*cqe*/;
+	cb->rq = rq;
I believe that these could be set once at a setup time within a pool -
take a look at xsk_pool_set_rxq_info(). This will save us cycles so that
we will skip assignments per each processed xdp_buff.

AF_XDP ZC performance comes in a major part from the fact that thanks to
xsk_buff_pool we have less work to do per each processed buffer.
quoted hunk ↗ jump to hunk
+
 	/* wi->offset is not used in this function, because xdp->data and the
 	 * DMA address point directly to the necessary place. Furthermore, the
 	 * XSK allocator allocates frames per packet, instead of pages, so
diff --git a/include/net/xsk_buff_pool.h b/include/net/xsk_buff_pool.h
index f787c3f524b0..b298590429e7 100644
--- a/include/net/xsk_buff_pool.h
+++ b/include/net/xsk_buff_pool.h
@@ -19,8 +19,11 @@ struct xdp_sock;
 struct device;
 struct page;

+#define XSKB_CB_SIZE 16
+
 struct xdp_buff_xsk {
 	struct xdp_buff xdp;
+	u8 cb[XSKB_CB_SIZE]; /* Private area for the drivers to use. */
 	dma_addr_t dma;
 	dma_addr_t frame_dma;
 	struct xsk_buff_pool *pool;
@@ -143,6 +146,11 @@ static inline dma_addr_t xp_get_frame_dma(struct
xdp_buff_xsk *xskb)
 	return xskb->frame_dma;
 }

+static inline void *xp_get_cb(struct xdp_buff *xdp)
+{
+	return (void *)xdp + offsetof(struct xdp_buff_xsk, cb);
+}
This should have a wrapper in include/net/xdp_sock_drv.h that drivers will
call.

Generally I think this should fly but I'm not sure about cb being 16
bytes.
+
 void xp_dma_sync_for_cpu_slow(struct xdp_buff_xsk *xskb);
 static inline void xp_dma_sync_for_cpu(struct xdp_buff_xsk *xskb)
 {
quoted
quoted
quoted
I'll send my patch to add support to mlx5 (using the drv_priv pointer
approach) separately.
Saw them, thanks! Will include them in v3+.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help