Re: [PATCH net] r8169: fix multicast tx issue with macvlan interface
From: Eric Dumazet <edumazet@google.com>
Date: 2020-03-27 19:43:06
On Fri, Mar 27, 2020 at 12:17 PM Heiner Kallweit [off-list ref] wrote:
On 27.03.2020 19:52, Eric Dumazet wrote:quoted
On Fri, Mar 27, 2020 at 10:41 AM Heiner Kallweit [off-list ref] wrote:quoted
On 27.03.2020 10:39, Heiner Kallweit wrote:quoted
On 27.03.2020 10:08, Charles Daymand wrote:quoted
During kernel upgrade testing on our hardware, we found that macvlan interface were no longer able to send valid multicast packet. tcpdump run on our hardware was correctly showing our multicast packet but when connecting a laptop to our hardware we didn't see any packets. Bisecting turned up commit 93681cd7d94f "r8169: enable HW csum and TSO" activates the feature NETIF_F_IP_CSUM which is responsible for the drop of packet in case of macvlan interface. Note that revision RTL_GIGA_MAC_VER_34 was already a specific case since TSO was keep disabled. Deactivating NETIF_F_IP_CSUM using ethtool is correcting our multicast issue, but we believe that this hardware issue is important enough to keep tx checksum off by default on this revision. The change is deactivating the default value of NETIF_F_IP_CSUM for this specific revision.The referenced commit may not be the root cause but just reveal another issue that has been existing before. Root cause may be in the net core or somewhere else. Did you check with other RTL8168 versions to verify that it's indeed a HW issue with this specific chip version? What you could do: Enable tx checksumming manually (via ethtool) on older kernel versions and check whether they are fine or not. If an older version is fine, then you can start a new bisect with tx checksumming enabled. And did you capture and analyze traffic to verify that actually the checksum is incorrect (and packets discarded therefore on receiving end)?quoted
Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO") Signed-off-by: Charles Daymand <redacted> --- net/drivers/net/ethernet/realtek/r8169_main.c | 3 +++ 1 file changed, 3 insertions(+)diff --git a/net/drivers/net/ethernet/realtek/r8169_main.c b/net/drivers/net/ethernet/realtek/r8169_main.c index a9bdafd15a35..3b69135fc500 100644 --- a/net/drivers/net/ethernet/realtek/r8169_main.c +++ b/net/drivers/net/ethernet/realtek/r8169_main.c@@ -5591,6 +5591,9 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) dev->vlan_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG); dev->hw_features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG); dev->features &= ~(NETIF_F_ALL_TSO | NETIF_F_SG); + if (tp->mac_version == RTL_GIGA_MAC_VER_34) { + dev->features &= ~NETIF_F_IP_CSUM; + } } dev->hw_features |= NETIF_F_RXALL;After looking a little bit at the macvlen code I think there might be an issue in it, but I'm not sure, therefore let me add Eric (as macvlen doesn't seem to have a dedicated maintainer). r8169 implements a ndo_features_check callback that disables tx checksumming for the chip version in question and small packets (due to a HW issue). macvlen uses passthru_features_check() as ndo_features_check callback, this seems to indicate to me that the ndo_features_check callback of lowerdev is ignored. This could explain the issue you see.macvlan_queue_xmit() calls dev_queue_xmit_accel() after switching skb->dev, so the second __dev_queue_xmit() should eventually call the real_dev ndo_features_check()Thanks, Eric. There's a second path in macvlan_queue_xmit() calling dev_forward_skb(vlan->lowerdev, skb). Does what you said apply also there?
This path wont send packets to the physical NIC, packets are injected back via dev_forward_skb()
Still I find it strange that a tx hw checksumming issue should affect multicasts only. Also the chip version in question is quite common and I would expect others to have hit the same issue. Maybe best would be to re-test on the affected system w/o involving macvlen.quoted
quoted
Would be interesting to see whether it fixes your issue if you let the macvlen ndo_features_check call lowerdev's ndo_features_check. Can you try this? By the way: Also the ndo_fix_features callback of lowerdev seems to be ignored.