Re: net: dsa: lantiq_gswip: getting the first selftests to pass

From: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Date: 2022-07-28 22:10:06

Hi Vladimir,

On Thu, Jul 28, 2022 at 12:44 AM Vladimir Oltean [off-list ref] wrote:
[...]

quoted

I am starting with
tools/testing/selftests/drivers/net/dsa/local_termination.sh as my
understanding is that this contains the most basic tests and should be
the first step.

I don't think I've said local_termination.sh contains the most basic
tests. Some tests are basic, but not all are.

noted, thanks for clarifying this

[...]

So the absolute first step would be to control the bridge port flags
(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD) and start
with good defaults for standalone mode (also set skb->offload_fwd_mark
when appropriate in the tagging protocol driver). I think you can use
bridge_vlan_aware.sh and bridge_vlan_unaware.sh as starting points to
check that these flags still work fine after you've offloaded them to
hardware.

My understanding of "good defaults" is:
- disable learning on all ports
- disable unicast flooding on all ports
- enable broadcast flooding on all ports
- (GSWIP can only enable broadcast and multicast at the same time, so
that's enabled too)

I think skb->offload_fwd_mark needs to be set unless we know that the
hardware wasn't able to forward the frame/packet.
In the vendor sources I was able to find the whole RX tag structure: [0]
I am not sure about the "mirror" bit (I assume this is: packet was
received on this port because this port is configured as a mirroring
target). All other bits seem irrelevant for skb->offload_fwd_mark -
meaning we always have to set skb->offload_fwd_mark.

I have lots of failures in bridge_vlan_aware.sh and
bridge_vlan_unaware.sh - even before any of my changes - which I'll
need to investigate.

When flooding a packet to find its destination can be achieved without
involving the CPU (*), the next thing will be to simply disable flooding
packets of all kind to the CPU (except broadcast). That's when you'll
enjoy watching how all the local_termination.sh selftests fail, and
you'll be making them pass again, one by one.

it's called test driven development :-)

quoted

Full local_termination.sh selftest output:
TEST: lan2: Unicast IPv4 to primary MAC address                 [ OK ]

For this to pass, the driver must properly respond to a port_fdb_add()
on the CPU port, with the MAC address of the $swp1 user port's net device,
offloaded in the DSA_DB_PORT corresponding to $swp1.

In turn, for DSA to even consider passing you FDB entries in DSA_DB_PORT,
you must make dsa_switch_supports_uc_filtering() return true.

(if you don't know what the words here mean, I've updated the documentation at
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/tree/Documentation/networking/dsa/dsa.rst)

Three out of four cases for the FDB isolation code are clear to me:
- the DSA_DB_BRIDGE case is easy as this is basically what we had
implemented before and I "just" need to look up the FID based on
db.bridge.dev
- DSA_DB_PORT for a non-CPU/user port: we should use the same FID as
standalone ("single port bridge") ports use (that's port number + 1;
see gswip_add_single_port_br())
- GSWIP doesn't support DSA_DB_LAG currently, so I can handle this
with -EOPNOTSUPP

One case took me a while to figure out but I think I finally
understood it (and now it's clear why the FDB patch I sent earlier
cannot be upstreamed):
- DSA_DB_PORT for the CPU port: the port argument for port_fdb_add is
the CPU port - but we can't map this to a FID (those are always tied
to either a bridge or a user port). So instead I need to look at db.dp
and for example use it's index for getting the FID (for standalone
ports the FID is: port index + 1). That results in: we're requested to
install the CPU ports MAC address on the CPU port (6), but what we
actually do is install the FDB entry with the CPU port's MAC address
on a user port (let's say 4, which we get from db.dp). Now if a
packet/frame should target the CPU port we don't need flooding because
the switch knows the destination port based on the FDB entry we
installed.

Note to myself: I'll also need to look at
assisted_learning_on_cpu_port when I'm at the point where I can do
optimizations (rather than fixing failing tests).

[...]

Nobody will call port_fdb_add() for the address tested here.

quoted

TEST: lan2: Unicast IPv4 to unknown MAC address, promisc        [ OK ]

Now this passes because the expectation of promiscuous ports is to
receive all packets regardless of MAC DA, that's the definition of
promiscuity. The driver currently already floods to the CPU, so why
wouldn't this pass.

Here, what we actually want to capture is that dsa_slave_manage_host_flood(),
which responds to changes in the IFF_PROMISC flag on a user port, does
actually notify the driver via a call to port_set_host_flood() for that
user port. Through this method, the driver is responsible for turning
flooding towards the CPU port(s) on or off, from the user port given as
argument. If CPU flood control does not depend on user port, then you'll
have to keep CPU flooding enabled as long as any user port wants it.

Thanks for this hint. Indeed, I need to implement it like you did in
felix_port_set_host_flood() because GSWIP can only enable/disable
flooding to the CPU port globally - it can't configure this based on
the source port.

[...]

quoted

TEST: lan2: Multicast IPv6 to unknown group                     [FAIL]
        reception succeeded, but should have failed
TEST: lan2: Multicast IPv6 to unknown group, promisc            [ OK ]
TEST: lan2: Multicast IPv6 to unknown group, allmulti           [ OK ]
TEST: br0: Unicast IPv4 to primary MAC address                  [ OK ]

Here is where things get interesting. I'm going to take a pause and
explain that the bridge related selftests fail in the same way for me
too, and that the fixes should go to the bridge driver rather than to
DSA.

oh, that's good to know - thanks for sharing this info!

[...]

quoted

TEST: br0: Unicast IPv4 to unknown MAC address                  [FAIL]
        reception succeeded, but should have failed
TEST: br0: Unicast IPv4 to unknown MAC address, promisc         [ OK ]
TEST: br0: Unicast IPv4 to unknown MAC address, allmulti        [FAIL]
        reception succeeded, but should have failed
TEST: br0: Multicast IPv4 to joined group                       [ OK ]

Just one more small comment to make.
The addresses in the "br0" tests are also notified through port_mdb_add(),
but they use DSA_DB_BRIDGE since they come to DSA via switchdev rather
than via dev_mc_add() - they came _to_the_bridge_ via dev_mc_add().
DSA drivers are expect to offload these multicast entries to a different
database than the port_mdb_add() I've described above. This is a
database that is active only while $swp1 is part of a bridge, while
DSA_DB_PORT is active only while $swp1 is standalone.

The vendor driver lists a bunch of possible tables: [1]
I'll leave any effort for multicast out for now... it's not the main
use-case I am looking at right now.

As always: thank you for your amazing explanations, hints and pointers!
Also I would like to point out that I am still doing all of this in my
spare time. Whenever you have other conversations to focus on: please
do so! For me it's not critical if you're getting back to me a few
days sooner or later.


Best regards,
Martin


[0] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/patches-3.10/7000-NET-lantiq-adds-eth-drv.patch#L2238-L2259
[1] https://github.com/paldier/K3C/blob/ca7353eb397090c363632409d9ca568d3cca09c7/ugw/target/linux/lantiq/files/drivers/net/ethernet/lantiq/switch-api/ltq_flow_core.h#L164-L192

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help