Thread (38 messages) 38 messages, 5 authors, 2023-11-02

Re: [PATCH bpf-next v6 11/18] ice: put XDP meta sources assignment under a static key condition

From: Larysa Zaremba <hidden>
Date: 2023-11-02 13:49:50
Also in: bpf

On Thu, Nov 02, 2023 at 02:23:02PM +0100, Maciej Fijalkowski wrote:
On Tue, Oct 31, 2023 at 03:22:31PM +0100, Larysa Zaremba wrote:
quoted
On Sat, Oct 28, 2023 at 09:55:52PM +0200, Maciej Fijalkowski wrote:
quoted
On Mon, Oct 23, 2023 at 11:35:46AM +0200, Larysa Zaremba wrote:
quoted
On Fri, Oct 20, 2023 at 06:32:13PM +0200, Maciej Fijalkowski wrote:
quoted
On Thu, Oct 12, 2023 at 07:05:17PM +0200, Larysa Zaremba wrote:
quoted
Usage of XDP hints requires putting additional information after the
xdp_buff. In basic case, only the descriptor has to be copied on a
per-packet basis, because xdp_buff permanently resides before per-ring
metadata (cached time and VLAN protocol ID).

However, in ZC mode, xdp_buffs come from a pool, so memory after such
buffer does not contain any reliable information, so everything has to be
copied, damaging the performance.

Introduce a static key to enable meta sources assignment only when attached
XDP program is device-bound.

This patch eliminates a 6% performance drop in ZC mode, which was a result
of addition of XDP hints to the driver.

Signed-off-by: Larysa Zaremba <redacted>
---
 drivers/net/ethernet/intel/ice/ice.h      |  1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++++++++++++
 drivers/net/ethernet/intel/ice/ice_txrx.c |  3 ++-
 drivers/net/ethernet/intel/ice/ice_xsk.c  |  3 +++
 4 files changed, 20 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 3d0f15f8b2b8..76d22be878a4 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -210,6 +210,7 @@ enum ice_feature {
 };
 
 DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key);
+DECLARE_STATIC_KEY_FALSE(ice_xdp_meta_key);
 
 struct ice_channel {
 	struct list_head list;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 47e8920e1727..ee0df86d34b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -48,6 +48,9 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)");
 DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key);
 EXPORT_SYMBOL(ice_xdp_locking_key);
 
+DEFINE_STATIC_KEY_FALSE(ice_xdp_meta_key);
+EXPORT_SYMBOL(ice_xdp_meta_key);
+
 /**
  * ice_hw_to_dev - Get device pointer from the hardware structure
  * @hw: pointer to the device HW structure
@@ -2634,6 +2637,11 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 	return -ENOMEM;
 }
 
+static bool ice_xdp_prog_has_meta(struct bpf_prog *prog)
+{
+	return prog && prog->aux->dev_bound;
+}
+
 /**
  * ice_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI
  * @vsi: VSI to set the bpf prog on
@@ -2644,10 +2652,16 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog)
 	struct bpf_prog *old_prog;
 	int i;
 
+	if (ice_xdp_prog_has_meta(prog))
+		static_branch_inc(&ice_xdp_meta_key);
i thought boolean key would be enough but inc/dec should serve properly
for example prog hotswap cases.
My thought process on using counting instead of boolean was: there can be 
several PFs that use the same driver, so therefore we need to keep track of how 
many od them use hints. 
Very good point. This implies that if PF0 has hints-enabled prog loaded,
PF1 with non-hints prog will "suffer" from it.

Sorry for such a long delays in responses but I was having a hard time
making up my mind about it. In the end I have come up to some conclusions.
I know the timing for sending this response is not ideal, but I need to
get this off my chest and bring discussion back to life:)

IMHO having static keys to eliminate ZC overhead does not scale. I assume
every other driver would have to follow that.

XSK pool allows us to avoid initializing various things per each packet.
Instead, taking xdp_rxq_info as an example, each xdp_buff from pool has
xdp_rxq_info assigned at init time. With this in mind, we should have some
mechanism to set hints-specific things in xdp_buff_xsk::cb, at init time
as well. Such mechanism should not require us to expose driver's private
xdp_buff hints containers (such as ice_pkt_ctx) to XSK pool.

Right now you moved phctime down to ice_pkt_ctx and to me that's the main
reason we have to copy ice_pkt_ctx to each xdp_buff on ZC. What if we keep
the cached_phctime at original offset in ring but ice_pkt_ctx would get a
pointer to that?

This would allow us to init the pointer in each xdp_buff from XSK pool at
init time. I have come up with a way to program that via so called XSK
meta descriptors. Each desc would have data to write onto cb, offset
within cb and amount of bytes to write/copy.

I'll share the diff below but note that I didn't measure how much lower
the performance is degraded. My icelake machine where I used to measure
performance-sensitive code got broke. For now we can't escape initing
eop_desc per each xdp_buff, but I moved it to alloc side, as we mangle
descs there anyway.

I think mlx5 could benefit from that approach as well with initing the rq
ptr at init time.

Diff does mostly these things:
- move cached_phctime to old place in ice_rx_ring and add ptr to that in
  ice_pkt_ctx
- introduce xsk_pool_set_meta()
- use it from ice side.
Thank you for the code! I will probably send v7 with such changes. Are you OK, 
if patch with core changes would go with you as an author?
Yes or I can produce a patch and share, up to you.
I have already started, your diff does not compile, so I took some creative 
liberty. Will send you patches for verification this week.
 
quoted
But also, I see a minor problem with that switching VLAN protocol does not 
trigger buffer allocation, so we have to point to that too, this probably means 
moving cached time back and finding 16 extra bits in CL3. Single pointer to 
{cached time, vlan_proto} would be copied to be after xdp_buff.
It's not that it has to trigger buffer allocation, we could stop the
interface if pool is present and update vlan proto on pool's xdp_buffs
(from quick glance i don't see that we're stopping iface for setting vlan
features) but that sounds like more of a hassle to do...

So yeah maybe let's just have a ptr in ice_pkt_ctx as well.

[...]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help