Re: [xdp-hints] Re: [Intel-wired-lan] [PATCH bpf-next V1] igc: enable and fix RX hash usage by netstack
From: Jesper Dangaard Brouer <hidden>
Date: 2023-02-16 13:30:43
Also in:
bpf, intel-wired-lan
On 14/02/2023 16.00, Paul Menzel wrote:
Thank you very much for your patch.
Thanks for your review :-)
Am 10.02.23 um 16:07 schrieb Jesper Dangaard Brouer:quoted
When function igc_rx_hash() was introduced in v4.20 via commit 0507ef8a0372 ("igc: Add transmit and receive fastpath and interrupt handlers"), the hardware wasn't configured to provide RSS hash, thus it made sense to not enable net_device NETIF_F_RXHASH feature bit. The NIC hardware was configured to enable RSS hash info in v5.2 via commit 2121c2712f82 ("igc: Add multiple receive queues control supporting"), but forgot to set the NETIF_F_RXHASH feature bit. The original implementation of igc_rx_hash() didn't extract the associated pkt_hash_type, but statically set PKT_HASH_TYPE_L3. The largest portions of this patch are about extracting the RSS Type from the hardware and mapping this to enum pkt_hash_types. This were based on Foxville i225 software users/This were/This was/
Fixed for V2
quoted
manual rev-1.3.1 and tested on Intel Ethernet Controller I225-LM (rev 03). For UDP it's worth noting that RSS (type) hashing have been disabled both for IPv4 and IPv6 (see IGC_MRQC_RSS_FIELD_IPV4_UDP + IGC_MRQC_RSS_FIELD_IPV6_UDP) because hardware RSS doesn't handle fragmented pkts well when enabled (can cause out-of-order). This result in PKT_HASH_TYPE_L3 for UDP packets, andresult*s*
Fixed for V2
quoted
hash value doesn't include UDP port numbers. Not being PKT_HASH_TYPE_L4, have the effect that netstack will do a software based hash calc calling into flow_dissect, but only when code calls skb_get_hash(), which doesn't necessary happen for local delivery.Excuse my ignorance, but is that bug visible in practice by users (performance?) or is that fix needed for future work?quoted
Fixes: 2121c2712f82 ("igc: Add multiple receive queues control supporting") Signed-off-by: Jesper Dangaard Brouer <redacted> --- drivers/net/ethernet/intel/igc/igc.h | 52 +++++++++++++++++++++++++++++ drivers/net/ethernet/intel/igc/igc_main.c | 35 +++++++++++++++++--- 2 files changed, 83 insertions(+), 4 deletions(-)diff --git a/drivers/net/ethernet/intel/igc/igc.hb/drivers/net/ethernet/intel/igc/igc.h index df3e26c0cf01..a112eeb59525 100644--- a/drivers/net/ethernet/intel/igc/igc.h +++ b/drivers/net/ethernet/intel/igc/igc.h@@ -311,6 +311,58 @@ extern char igc_driver_name[];#define IGC_MRQC_RSS_FIELD_IPV4_UDP 0x00400000 #define IGC_MRQC_RSS_FIELD_IPV6_UDP 0x00800000 +/* RX-desc Write-Back format RSS Type's */ +enum igc_rss_type_num { + IGC_RSS_TYPE_NO_HASH = 0, + IGC_RSS_TYPE_HASH_TCP_IPV4 = 1, + IGC_RSS_TYPE_HASH_IPV4 = 2, + IGC_RSS_TYPE_HASH_TCP_IPV6 = 3, + IGC_RSS_TYPE_HASH_IPV6_EX = 4, + IGC_RSS_TYPE_HASH_IPV6 = 5, + IGC_RSS_TYPE_HASH_TCP_IPV6_EX = 6, + IGC_RSS_TYPE_HASH_UDP_IPV4 = 7, + IGC_RSS_TYPE_HASH_UDP_IPV6 = 8, + IGC_RSS_TYPE_HASH_UDP_IPV6_EX = 9, + IGC_RSS_TYPE_MAX = 10, +}; +#define IGC_RSS_TYPE_MAX_TABLE 16 +#define IGC_RSS_TYPE_MASK 0xF + +/* igc_rss_type - Rx descriptor RSS type field */ +static inline u8 igc_rss_type(union igc_adv_rx_desc *rx_desc) +{ + /* RSS Type 4-bit number: 0-9 (above 9 is reserved) */ + return rx_desc->wb.lower.lo_dword.hs_rss.pkt_info & IGC_RSS_TYPE_MASK; +}Is it necessary to specficy the length of the return value, or could it be `unsigned int`. Using “native” types is normally more performant [1]. `scripts/bloat-o-meter` might help to verify that.
Thanks for the link[1]. Alex/Olek also pointed this out. The Agner's instruction latency tables[2] do indicate the latency is slightly higher for r8 and r16 (and m8/m16). And we likely need to look at the zero-extend variants movzx. I think we should investigate this with "tool" godbolt.org as scripts/bloat-o-meter will only tell us about code size. I will experiment a bit and report back :-) [2] https://www.agner.org/optimize/instruction_tables.pdf
[…]quoted
static inline void igc_rx_hash(struct igc_ring *ring, union igc_adv_rx_desc *rx_desc, struct sk_buff *skb) { - if (ring->netdev->features & NETIF_F_RXHASH) - skb_set_hash(skb, - le32_to_cpu(rx_desc->wb.lower.hi_dword.rss), - PKT_HASH_TYPE_L3); + if (ring->netdev->features & NETIF_F_RXHASH) { + u32 rss_hash = le32_to_cpu(rx_desc->wb.lower.hi_dword.rss); + u8 rss_type = igc_rss_type(rx_desc);Amongst others, also here.
Do notice I expect compiler to optimize this, such that is doesn't place this variable on the stack.
quoted
+ enum pkt_hash_types hash_type; + + hash_type = igc_rss_type_table[rss_type].hash_type; + skb_set_hash(skb, rss_hash, hash_type); + } } static void igc_rx_vlan(struct igc_ring *rx_ring,@@ -6501,6 +6527,7 @@ static int igc_probe(struct pci_dev *pdev,netdev->features |= NETIF_F_TSO; netdev->features |= NETIF_F_TSO6; netdev->features |= NETIF_F_TSO_ECN; + netdev->features |= NETIF_F_RXHASH; netdev->features |= NETIF_F_RXCSUM; netdev->features |= NETIF_F_HW_CSUM; netdev->features |= NETIF_F_SCTP_CRC;Kind regards, Paul [1]: https://notabs.org/coding/smallIntsBigPenalty.htm