Re: net: dsa: lantiq_gswip: getting the first selftests to pass
From: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Date: 2022-07-28 22:10:06
Hi Vladimir, On Thu, Jul 28, 2022 at 12:44 AM Vladimir Oltean [off-list ref] wrote: [...]
quoted
I am starting with tools/testing/selftests/drivers/net/dsa/local_termination.sh as my understanding is that this contains the most basic tests and should be the first step.I don't think I've said local_termination.sh contains the most basic tests. Some tests are basic, but not all are.
noted, thanks for clarifying this [...]
So the absolute first step would be to control the bridge port flags (BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD) and start with good defaults for standalone mode (also set skb->offload_fwd_mark when appropriate in the tagging protocol driver). I think you can use bridge_vlan_aware.sh and bridge_vlan_unaware.sh as starting points to check that these flags still work fine after you've offloaded them to hardware.
My understanding of "good defaults" is: - disable learning on all ports - disable unicast flooding on all ports - enable broadcast flooding on all ports - (GSWIP can only enable broadcast and multicast at the same time, so that's enabled too) I think skb->offload_fwd_mark needs to be set unless we know that the hardware wasn't able to forward the frame/packet. In the vendor sources I was able to find the whole RX tag structure: [0] I am not sure about the "mirror" bit (I assume this is: packet was received on this port because this port is configured as a mirroring target). All other bits seem irrelevant for skb->offload_fwd_mark - meaning we always have to set skb->offload_fwd_mark. I have lots of failures in bridge_vlan_aware.sh and bridge_vlan_unaware.sh - even before any of my changes - which I'll need to investigate.
When flooding a packet to find its destination can be achieved without involving the CPU (*), the next thing will be to simply disable flooding packets of all kind to the CPU (except broadcast). That's when you'll enjoy watching how all the local_termination.sh selftests fail, and you'll be making them pass again, one by one.
it's called test driven development :-)
quoted
Full local_termination.sh selftest output: TEST: lan2: Unicast IPv4 to primary MAC address [ OK ]For this to pass, the driver must properly respond to a port_fdb_add() on the CPU port, with the MAC address of the $swp1 user port's net device, offloaded in the DSA_DB_PORT corresponding to $swp1. In turn, for DSA to even consider passing you FDB entries in DSA_DB_PORT, you must make dsa_switch_supports_uc_filtering() return true. (if you don't know what the words here mean, I've updated the documentation at https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/Documentation/networking/dsa/dsa.rst)
Three out of four cases for the FDB isolation code are clear to me:
- the DSA_DB_BRIDGE case is easy as this is basically what we had
implemented before and I "just" need to look up the FID based on
db.bridge.dev
- DSA_DB_PORT for a non-CPU/user port: we should use the same FID as
standalone ("single port bridge") ports use (that's port number + 1;
see gswip_add_single_port_br())
- GSWIP doesn't support DSA_DB_LAG currently, so I can handle this
with -EOPNOTSUPP
One case took me a while to figure out but I think I finally
understood it (and now it's clear why the FDB patch I sent earlier
cannot be upstreamed):
- DSA_DB_PORT for the CPU port: the port argument for port_fdb_add is
the CPU port - but we can't map this to a FID (those are always tied
to either a bridge or a user port). So instead I need to look at db.dp
and for example use it's index for getting the FID (for standalone
ports the FID is: port index + 1). That results in: we're requested to
install the CPU ports MAC address on the CPU port (6), but what we
actually do is install the FDB entry with the CPU port's MAC address
on a user port (let's say 4, which we get from db.dp). Now if a
packet/frame should target the CPU port we don't need flooding because
the switch knows the destination port based on the FDB entry we
installed.
Note to myself: I'll also need to look at
assisted_learning_on_cpu_port when I'm at the point where I can do
optimizations (rather than fixing failing tests).
[...]Nobody will call port_fdb_add() for the address tested here.quoted
TEST: lan2: Unicast IPv4 to unknown MAC address, promisc [ OK ]Now this passes because the expectation of promiscuous ports is to receive all packets regardless of MAC DA, that's the definition of promiscuity. The driver currently already floods to the CPU, so why wouldn't this pass. Here, what we actually want to capture is that dsa_slave_manage_host_flood(), which responds to changes in the IFF_PROMISC flag on a user port, does actually notify the driver via a call to port_set_host_flood() for that user port. Through this method, the driver is responsible for turning flooding towards the CPU port(s) on or off, from the user port given as argument. If CPU flood control does not depend on user port, then you'll have to keep CPU flooding enabled as long as any user port wants it.
Thanks for this hint. Indeed, I need to implement it like you did in felix_port_set_host_flood() because GSWIP can only enable/disable flooding to the CPU port globally - it can't configure this based on the source port. [...]
quoted
TEST: lan2: Multicast IPv6 to unknown group [FAIL] reception succeeded, but should have failed TEST: lan2: Multicast IPv6 to unknown group, promisc [ OK ] TEST: lan2: Multicast IPv6 to unknown group, allmulti [ OK ] TEST: br0: Unicast IPv4 to primary MAC address [ OK ]Here is where things get interesting. I'm going to take a pause and explain that the bridge related selftests fail in the same way for me too, and that the fixes should go to the bridge driver rather than to DSA.
oh, that's good to know - thanks for sharing this info! [...]
quoted
TEST: br0: Unicast IPv4 to unknown MAC address [FAIL] reception succeeded, but should have failed TEST: br0: Unicast IPv4 to unknown MAC address, promisc [ OK ] TEST: br0: Unicast IPv4 to unknown MAC address, allmulti [FAIL] reception succeeded, but should have failed TEST: br0: Multicast IPv4 to joined group [ OK ]Just one more small comment to make. The addresses in the "br0" tests are also notified through port_mdb_add(), but they use DSA_DB_BRIDGE since they come to DSA via switchdev rather than via dev_mc_add() - they came _to_the_bridge_ via dev_mc_add(). DSA drivers are expect to offload these multicast entries to a different database than the port_mdb_add() I've described above. This is a database that is active only while $swp1 is part of a bridge, while DSA_DB_PORT is active only while $swp1 is standalone.
The vendor driver lists a bunch of possible tables: [1] I'll leave any effort for multicast out for now... it's not the main use-case I am looking at right now. As always: thank you for your amazing explanations, hints and pointers! Also I would like to point out that I am still doing all of this in my spare time. Whenever you have other conversations to focus on: please do so! For me it's not critical if you're getting back to me a few days sooner or later. Best regards, Martin [0] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/patches-3.10/7000-NET-lantiq-adds-eth-drv.patch#L2238-L2259 [1] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/files/drivers/net/ethernet/lantiq/switch-api/ltq_flow_core.h#L164-L192