Re: Invalid transport_offset with AF_PACKET socket
From: Saeed Mahameed <hidden>
Date: 2018-11-30 12:10:49
On Wed, Nov 28, 2018 at 3:10 AM Maxim Mikityanskiy [off-list ref] wrote:
Hi Saeed,quoted
Can you elaborate more, what NIC? what configuration ? what do you mean by confusion, anyway please see belowConnectX-4, after running `mlnx_qos -i eth1 --trust dscp`, which sets inline mode 2 (MLX5_INLINE_MODE_IP). I'll explain what I mean by confusion below.quoted
in mlx5 with ConnectX4 or Connext4-LX there is a requirement to copy at least the ethernet header to the tx descriptor otherwise this might cause the packet to be dropped, and for RAW sockets the skb headers offsets are not set, but the latest mlx5 upstream driver would know how to handle this, and copy the minmum amount required please see: static inline u16 mlx5e_calc_min_inline(enum mlx5_inline_modes mode, struct sk_buff *skb)Yes, I know that, and what I do is debugging an issue with this function.quoted
it should default to: case MLX5_INLINE_MODE_L2: default: hlen = mlx5e_skb_l2_header_offset(skb);The issue appears in MLX5_INLINE_MODE_IP. I haven't tested MLX5_INLINE_MODE_TCP_UDP yet, though.quoted
So it should return at least 18 and not 14.Yes, the function does its best to return at least 18, but it silently expects skb_transport_offset to exceed 18. In normal conditions, it will be more that 18, because it will be at least 14 + 20. But in my case, when I send a packet via an AF_PACKET socket, skb_transport_offset returns 14 (which is nonsense), and the driver uses this value, causing the hardware to fail, because it's less than 18.
Got it, so even if you copy 18 it is not sufficient ! if the packet is ipv4 or ipv6 and the inline mode is set to MLX5_INLINE_MODE_IP in the vport context you must copy the IP headers as well ! but what do you expect from AF_PACKET socket ? to parse each and every packet and set skb_transport_offset ?
quoted
We had some issues with this in old driver such as kernels 4.14/15, and it depends in the use case so i need some information first:No, it's not an old kernel. We actually have this bug in our internal bug tracking system, and I'm trying to resolve it.quoted
1. What Cards do you have ? (lspci)03:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 03:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4] 81:00.0 Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro] Testing with ConnectX-4.quoted
2. What kernel/driver version are you using ?I'm on net-next-mlx5, commit 66a4b5ef638a (the latest when I started the investigation).quoted
3. what is the current enum mlx5_inline_modes seen in mlx5e_calc_min_inline or sq->min_inline_mode ?MLX5_INLINE_MODE_IP, as I said above.quoted
4. Firmware version ? (ethtool -i)12.22.0238 (MT_2190110032)quoted
can you share the packet format you are sending and seeing the bad behavior withHere is the hexdump of the simplest packet that causes the problem when it's sent through AF_PACKET after `mlnx_qos -i eth1 --trust dscp`: 00000000: 11 22 33 44 55 66 77 88 99 aa bb cc 08 00 45 00 00000010: 00 20 00 00 40 00 40 11 ae a5 c6 12 00 01 c6 12 00000020: 00 02 00 00 4a 38 00 0c 29 82 61 62 63 64 (Please ignore the wrong UDP checksum and non-existing MACs, it doesn't matter at all, I tested it with completely valid packets as well. The wrong UDP checksum is due to a bug in our internal pypacket utility). Thanks, Max