Re: [PATCH bpf-next v6 11/18] ice: put XDP meta sources assignment under a static key condition
From: Larysa Zaremba <hidden>
Date: 2023-11-02 13:49:50
Also in:
bpf
On Thu, Nov 02, 2023 at 02:23:02PM +0100, Maciej Fijalkowski wrote:
On Tue, Oct 31, 2023 at 03:22:31PM +0100, Larysa Zaremba wrote:quoted
On Sat, Oct 28, 2023 at 09:55:52PM +0200, Maciej Fijalkowski wrote:quoted
On Mon, Oct 23, 2023 at 11:35:46AM +0200, Larysa Zaremba wrote:quoted
On Fri, Oct 20, 2023 at 06:32:13PM +0200, Maciej Fijalkowski wrote:quoted
On Thu, Oct 12, 2023 at 07:05:17PM +0200, Larysa Zaremba wrote:quoted
Usage of XDP hints requires putting additional information after the xdp_buff. In basic case, only the descriptor has to be copied on a per-packet basis, because xdp_buff permanently resides before per-ring metadata (cached time and VLAN protocol ID). However, in ZC mode, xdp_buffs come from a pool, so memory after such buffer does not contain any reliable information, so everything has to be copied, damaging the performance. Introduce a static key to enable meta sources assignment only when attached XDP program is device-bound. This patch eliminates a 6% performance drop in ZC mode, which was a result of addition of XDP hints to the driver. Signed-off-by: Larysa Zaremba <redacted> --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++++++++++++ drivers/net/ethernet/intel/ice/ice_txrx.c | 3 ++- drivers/net/ethernet/intel/ice/ice_xsk.c | 3 +++ 4 files changed, 20 insertions(+), 1 deletion(-)diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 3d0f15f8b2b8..76d22be878a4 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h@@ -210,6 +210,7 @@ enum ice_feature { }; DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key); +DECLARE_STATIC_KEY_FALSE(ice_xdp_meta_key); struct ice_channel { struct list_head list;diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 47e8920e1727..ee0df86d34b7 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c@@ -48,6 +48,9 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)"); DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key); EXPORT_SYMBOL(ice_xdp_locking_key); +DEFINE_STATIC_KEY_FALSE(ice_xdp_meta_key); +EXPORT_SYMBOL(ice_xdp_meta_key); + /** * ice_hw_to_dev - Get device pointer from the hardware structure * @hw: pointer to the device HW structure@@ -2634,6 +2637,11 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) return -ENOMEM; } +static bool ice_xdp_prog_has_meta(struct bpf_prog *prog) +{ + return prog && prog->aux->dev_bound; +} + /** * ice_vsi_assign_bpf_prog - set or clear bpf prog pointer on VSI * @vsi: VSI to set the bpf prog on@@ -2644,10 +2652,16 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog) struct bpf_prog *old_prog; int i; + if (ice_xdp_prog_has_meta(prog)) + static_branch_inc(&ice_xdp_meta_key);i thought boolean key would be enough but inc/dec should serve properly for example prog hotswap cases.My thought process on using counting instead of boolean was: there can be several PFs that use the same driver, so therefore we need to keep track of how many od them use hints.Very good point. This implies that if PF0 has hints-enabled prog loaded, PF1 with non-hints prog will "suffer" from it. Sorry for such a long delays in responses but I was having a hard time making up my mind about it. In the end I have come up to some conclusions. I know the timing for sending this response is not ideal, but I need to get this off my chest and bring discussion back to life:) IMHO having static keys to eliminate ZC overhead does not scale. I assume every other driver would have to follow that. XSK pool allows us to avoid initializing various things per each packet. Instead, taking xdp_rxq_info as an example, each xdp_buff from pool has xdp_rxq_info assigned at init time. With this in mind, we should have some mechanism to set hints-specific things in xdp_buff_xsk::cb, at init time as well. Such mechanism should not require us to expose driver's private xdp_buff hints containers (such as ice_pkt_ctx) to XSK pool. Right now you moved phctime down to ice_pkt_ctx and to me that's the main reason we have to copy ice_pkt_ctx to each xdp_buff on ZC. What if we keep the cached_phctime at original offset in ring but ice_pkt_ctx would get a pointer to that? This would allow us to init the pointer in each xdp_buff from XSK pool at init time. I have come up with a way to program that via so called XSK meta descriptors. Each desc would have data to write onto cb, offset within cb and amount of bytes to write/copy. I'll share the diff below but note that I didn't measure how much lower the performance is degraded. My icelake machine where I used to measure performance-sensitive code got broke. For now we can't escape initing eop_desc per each xdp_buff, but I moved it to alloc side, as we mangle descs there anyway. I think mlx5 could benefit from that approach as well with initing the rq ptr at init time. Diff does mostly these things: - move cached_phctime to old place in ice_rx_ring and add ptr to that in ice_pkt_ctx - introduce xsk_pool_set_meta() - use it from ice side.Thank you for the code! I will probably send v7 with such changes. Are you OK, if patch with core changes would go with you as an author?Yes or I can produce a patch and share, up to you.
I have already started, your diff does not compile, so I took some creative liberty. Will send you patches for verification this week.
quoted
But also, I see a minor problem with that switching VLAN protocol does not trigger buffer allocation, so we have to point to that too, this probably means moving cached time back and finding 16 extra bits in CL3. Single pointer to {cached time, vlan_proto} would be copied to be after xdp_buff.It's not that it has to trigger buffer allocation, we could stop the interface if pool is present and update vlan proto on pool's xdp_buffs (from quick glance i don't see that we're stopping iface for setting vlan features) but that sounds like more of a hassle to do... So yeah maybe let's just have a ptr in ice_pkt_ctx as well. [...]