Thread (88 messages) 88 messages, 9 authors, 2022-02-01

Re: [PATCH net-next v4 11/11] net: dsa: realtek: rtl8365mb: multiple cpu ports, non cpu extint

From: Arınç ÜNAL <hidden>
Date: 2022-01-25 09:49:56

On 25/01/2022 01:30, Vladimir Oltean wrote:
On Mon, Jan 24, 2022 at 01:42:42PM -0800, Jakub Kicinski wrote:
quoted
On Mon, 24 Jan 2022 22:56:07 +0200 Vladimir Oltean wrote:
quoted
On Mon, Jan 24, 2022 at 11:38:12AM -0800, Jakub Kicinski wrote:
quoted
On Mon, 24 Jan 2022 21:08:45 +0200 Vladimir Oltean wrote:
quoted
So before we declare that any given Ethernet driver is buggy for declaring
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM and not checking that skb->csum_start
points where it expects it to (taking into consideration potential VLAN
headers, IPv6 extension headers),
Extension headers are explicitly not supported by NETIF_F_IPV6_CSUM.

IIRC Tom's hope was to delete NETIF_F_IP*_CSUM completely once all
drivers are converted to parsing and therefore can use NETIF_F_HW_CSUM.
IIUC, NETIF_F_IP*_CSUM vs NETIF_F_HW_CSUM doesn't make that big of a
difference in terms of what the driver should check for, if the hardware
checksum offload engine can't directly be given the csum_start and
csum_offset, wherever they may be.
quoted
quoted
is there any driver that _does_ perform these checks correctly, that
could be used as an example?
I don't think so. Let me put it this way - my understanding is that up
until now we had been using the vlan_features, mpls_features etc to
perform L2/L2.5/below-IP feature stripping. This scales poorly to DSA
tags, as discussed in this thread.

I'm suggesting we extend the kind of checking we already do to work
around inevitable deficiencies of device parsers for tunnels to DSA
tags.
Sorry, I'm very tired and I probably don't understand what you're
saying, so excuse the extra clarification questions.

The typical protocol checking that drivers with NETIF_F_HW_CSUM do seems
to be based on vlan_get_protocol()/skb->protocol/skb_network_header()/
skb_transport_header() values, all of which make DSA invisible. So they
don't work if the underlying hardware really doesn't like seeing an
unexpected DSA header.

When you say "I'm suggesting we extend the kind of checking we already do",
do you mean we should modify the likes of e1000e and igb such that, if
they're ever used as DSA masters, they do a full header parse of the
packet (struct ethhdr :: h_proto, check if VLAN, struct iphdr/ipv6hdr,
etc.) instead of the current logic?
That was my thinking, yes. The exact amount of work depends on the
driver, I believe that more recent Intel parts (igb, ixgbe and newer)
pass a L3 offset to the HW. They treat L2 as opaque, ergo no patches
needed. At a glance e1000e passes the full skb_checksum_start_offset()
to HW, so likely even better.
Ah, right, I missed that. I agree that a driver that uses
skb->csum_start is very likely to work unmodified with DSA headers
(not trailers). I didn't notice this in e1000 because I was just
searching for csum_start.
quoted
It's only drivers for devices which actually want to parse the Ethertype
that would need extra checks. (Coincidentally such devices can't support
MPLS given the lack of L3 indication in the frame.)
quoted
It will be pretty convoluted unless
we have some helper. Because if I follow through, for a DSA-tagged IP
packet on xmit, skb->protocol is certainly htons(ETH_P_IP):

ntohs(skb->protocol) = 0x800, csum_offset = 16, csum_start = 280, skb_checksum_start_offset = 54, skb->network_header = 260, skb_network_header_len = 20

skb_dump output:
skb len=94 headroom=226 headlen=94 tailroom=384
mac=(226,34) net=(260,20) trans=280
shinfo(txflags=0 nr_frags=0 gso(size=0 type=1 segs=1))
csum(0x100118 ip_summed=3 complete_sw=0 valid=0 level=0)
hash(0x7710ee84 sw=0 l4=1) proto=0x0800 pkttype=0 iif=0
dev name=eno2 feat=0x00020100001149a9
sk family=2 type=1 proto=6
skb headroom: 00000000: 6c 00 03 02 64 65 76 00 fe ed ca fe 28 00 00 00
...(junk)...
skb headroom: 000000e0: 5f 43
                         20 byte DSA tag
                         |
                         v
skb linear:   00000000: 88 80 00 0a 80 00 00 00 00 00 00 00 08 00 30 00
                                     skb_mac_header()
                                     |
                                     v
skb linear:   00000010: 00 00 00 00 68 05 ca 92 af 20 00 04 9f 05 f6 28
                               skb_network_header()
                               |
                               v
skb linear:   00000020: 08 00 45 00 00 3c 26 47 40 00 40 06 00 49 0a 00
                                           skb_checksum_start_offset
                                           |
                                           |                       csum_offset
                                           v                       v
skb linear:   00000030: 00 2c 0a 00 00 01 b6 08 14 51 11 1f 91 4f 00 00
skb linear:   00000040: 00 00 a0 02 fa f0 14 5b 00 00 02 04 05 b4 04 02
skb linear:   00000050: 08 0a 2e 00 e5 b8 00 00 00 00 01 03 03 07
Oof, so in this case the DSA tag is _before_ the skb_mac_header()?
Or the prepend is supposed to be parsable as a Ethernet header?
Seems like any device that can do csum over this packet must already
use L3/L4 offsets or have explicit knowledge of DSA, right?
I'm sorry, there's a mistake, skb_mac_header() points to the DSA tag
here (and skb_mac_header_len is 34), I wanted to say "real MAC header"
but failed to do so. This case shouldn't pose any special problems.
quoted
quoted
I don't know, I just don't expect that non-DSA users of those drivers
will be very happy about such changes. Do these existing protocol
checking schemes qualify as buggy?
Unfortunate reality of the checksum offloads is that most drivers for
devices which parse on Tx are buggy, it's more of a question of whether
anyone tried to use an unsupported protocol stack :( Recent example
that comes to mind is 1698d600b361 ("bnxt_en: Implement
.ndo_features_check().").
Nice hook this ndo_features_check! In my reading of validate_xmit_skb()
I went right past it.
quoted
quoted
If this is the convention that we want to enforce, then I can't really
help Luiz with fixing the OpenWRT mtk_eth_soc.c - he'll have to figure
out a way to parse the packets for which his hardware will accept the
checksumming offload, and call skb_checksum_help() otherwise.
quoted
We can come up with various schemes of expressing capabilities
between underlying driver and tag driver. I'm not aware of similar
out-of-band schemes existing today so it'd be "DSA doing it's own
thing", which does not seem great.
It at least seems less complex to me, and less checking in the fast path
if I understand everything that's been said correctly.
I understand, I'm primarily trying to share some context and prior work.
I don't mean to nack all other approaches.

I believe writing a parser matching the device behavior would be easier
for a driver author than interpreting the runes of our csum offload API
and getting thru the thicket of all the bits. If that's not the case my
argument is likely defeated.
Ok, writing a parser might be needed if the DSA master is going, for
some reason, to support TX checksum offloading with some DSA headers but
not with others.

If that is not the case, and that Ethernet controller simply doesn't
support TX checksumming unless it's the plain old Ethernet {+ VLAN} +
IP/IPv6 + TCP/UDP, then this blanket patch below should fix the problem
almost elegantly, and parsing is a useless complication (warning, not
even compile-tested!):
I tried it on the upstream mtk_eth_soc on an mt7621a board with an 
rtl8367s switch connected to the 2nd GMAC of the SoC running 5.17-rc1.

Although "ethtool --show-offload eth1" does not show anything different, 
I can now ssh to the device (only way that came into my mind to quickly 
check TCP traffic).
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1986,6 +1986,16 @@ static int mtk_hwlro_get_fdir_all(struct
  	return 0;
  }

+static netdev_features_t fe_features_check(struct sk_buff *skb,
+					   struct net_device *dev,
+					   netdev_features_t features)
+{
+	if (netdev_uses_dsa(dev))
+		features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
+
+	return features;
+}
+
  static netdev_features_t mtk_fix_features(struct net_device *dev,
  					  netdev_features_t features)
  {
@@ -2906,6 +2916,7 @@ static const struct net_device_ops mtk_n
  #ifdef CONFIG_NET_POLL_CONTROLLER
  	.ndo_poll_controller	= mtk_poll_controller,
  #endif
+	.ndo_features_check	= fe_features_check,
  	.ndo_setup_tc		= mtk_eth_setup_tc,
  };

quoted hunk ↗ jump to hunk
-----------------------------[ cut here ]-----------------------------
 From 5ef3d3cd8441d756933558212f518f48754c64d9 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <vladimir.oltean@nxp.com>
Date: Tue, 25 Jan 2022 00:16:57 +0200
Subject: [PATCH] ramips: ethernet: ralink: disable TX checksumming on DSA
  masters

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
  .../files/drivers/net/ethernet/ralink/mtk_eth_soc.c  | 12 ++++++++++++
  1 file changed, 12 insertions(+)
diff --git a/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c b/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
index e07e5ed5a8f8..6ed9bc5942fd 100644
--- a/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
+++ b/target/linux/ramips/files/drivers/net/ethernet/ralink/mtk_eth_soc.c
@@ -31,6 +31,7 @@
  #include <linux/io.h>
  #include <linux/bug.h>
  #include <linux/netfilter.h>
+#include <net/dsa.h>
  #include <net/netfilter/nf_flow_table.h>
  #include <linux/of_gpio.h>
  #include <linux/gpio.h>
@@ -1497,6 +1498,16 @@ static int fe_change_mtu(struct net_device *dev, int new_mtu)
  	return fe_open(dev);
  }
  
+static netdev_features_t fe_features_check(struct sk_buff *skb,
+					   struct net_device *dev,
+					   netdev_features_t features)
+{
+	if (netdev_uses_dsa(dev))
+		features &= ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
+
+	return features;
+}
+
  static const struct net_device_ops fe_netdev_ops = {
  	.ndo_init		= fe_init,
  	.ndo_uninit		= fe_uninit,
@@ -1514,6 +1525,7 @@ static const struct net_device_ops fe_netdev_ops = {
  #ifdef CONFIG_NET_POLL_CONTROLLER
  	.ndo_poll_controller	= fe_poll_controller,
  #endif
+	.ndo_features_check	= fe_features_check,
  };
  
  static void fe_reset_pending(struct fe_priv *priv)
-----------------------------[ cut here ]-----------------------------
This is essentially what Florian said ages ago, it just took me a very
long time to process. I guess what hadn't fully clicked in my head is
that the TX checksumming offload being functional is more a matter of
telling the hardware what are the L3 and L4 offsets, and the csum_offset,
rather than it requiring any particular understanding of the DSA header.
In turn, it means that for "nice" Ethernet controller implementations
where that is the case, it would be actively detrimential to add a new
.ndo_get_dsa_features() or something like that - because such a driver
would report that it supports all DSA header-type formats (trailers are
still broken as long as there isn't a csum_end). And keeping that kind
of driver in sync with all DSA protocols that appear will become a
repetitive task.

So crisis averted, I guess?
Thanks a lot to both of you for the patient explanations!
I retract my proposal for a new ndo and also suggest that the DSA master
driver takes care to not leave as zero a TX checksum it can't offload.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help