Re: Latest net-next kernel 4.19.0+
From: Eric Dumazet <hidden>
Date: 2018-10-30 12:44:26
On 10/29/2018 07:53 PM, Eric Dumazet wrote:
On 10/29/2018 07:27 PM, Cong Wang wrote:quoted
Hi, On Mon, Oct 29, 2018 at 5:19 PM Paweł Staszewski [off-list ref] wrote:quoted
Sorry not complete - followed by hw csum: [ 342.190831] vlan1490: hw csum failure [ 342.190835] CPU: 52 PID: 0 Comm: swapper/52 Not tainted 4.19.0+ #1 [ 342.190836] Call Trace: [ 342.190839] <IRQ> [ 342.190849] dump_stack+0x46/0x5b [ 342.190856] __skb_checksum_complete+0x9a/0xa0 [ 342.190859] tcp_v4_rcv+0xef/0x960 [ 342.190864] ip_local_deliver_finish+0x49/0xd0 [ 342.190866] ip_local_deliver+0x5e/0xe0 [ 342.190869] ? ip_sublist_rcv_finish+0x50/0x50 [ 342.190870] ip_rcv+0x41/0xc0 [ 342.190874] __netif_receive_skb_one_core+0x4b/0x70 [ 342.190877] netif_receive_skb_internal+0x2f/0xd0 [ 342.190879] napi_gro_receive+0xb7/0xe0 [ 342.190884] mlx5e_handle_rx_cqe+0x7a/0xd0 [ 342.190886] mlx5e_poll_rx_cq+0xc6/0x930 [ 342.190888] mlx5e_napi_poll+0xab/0xc90We got exactly the same backtrace in our data center. However, it is not easy for us to reproduce it, do you have any clue to reproduce it? If you do, try to tcpdump the packets triggering this warning, it could be useful for debugging. Also, we tried to apply commit d55bef5059dd057bd, the warning _still_ occurs. We tried to revert the offending commit 88078d98d1bb, it disappears. So it is likely that commit 88078d98d1bb introduces more troubles than the one fixed by d55bef5059dd057bd.Or this could be that mlx5 driver is buggy when dealing with VLAN tags. It both uses vlan_tci (hardware vlan offload) in skb _and_ this piece of code in mlx5e_handle_csum() if (network_depth > ETH_HLEN) /* CQE csum is calculated from the IP header and does * not cover VLAN headers (if present). This will add * the checksum manually. */ skb->csum = csum_partial(skb->data + ETH_HLEN, network_depth - ETH_HLEN, skb->csum); That seems strange to me, because skb_vlan_untag() will not adjust skb->csum in this case.
Bug might be in NETIF_F_RXFCS mlx5 handling btw...
Code does :
if (unlikely(netdev->features & NETIF_F_RXFCS))
skb->csum = csum_add(skb->csum,
(__force __wsum)mlx5e_get_fcs(skb));
But Dimitris told us that we need to take into account if FCS starts at odd or even offset.
->
if (unlikely(netdev->features & NETIF_F_RXFCS))
skb->csum = csum_block_add(skb->csum,
(__force __wsum)mlx5e_get_fcs(skb),
skb->len);