Re: [PATCH net v1] net: dsa: lantiq_gswip: Fix FDB add/remove on the CPU port

From: Vladimir Oltean <olteanv@gmail.com>
Date: 2022-07-06 21:07:02
Also in: lkml, stable

On Sun, Jul 03, 2022 at 12:53:45AM +0200, Martin Blumenstingl wrote:

Hi Vladimir,

On Sat, Jul 2, 2022 at 8:56 PM Vladimir Oltean [off-list ref] wrote:

quoted

On Sat, Jul 02, 2022 at 07:43:11PM +0200, Martin Blumenstingl wrote:

quoted

Hi Vladimir,

On Fri, Jul 1, 2022 at 3:02 PM Vladimir Oltean [off-list ref] wrote:
[...]

quoted

Use FID 0 (which is also the "default" FID) when adding/removing an FDB
entry for the CPU port.

What does "default" FID even mean, and why is the default FID relevant?

The GSW140 datasheet [0] (which is for a newer IP than the one we are
targeting currently with the GSWIP driver - but I am not aware of any
older datasheets)

Thanks for the document! Really useful.

Great that this helps. Whenever you hear me contradicting statements
from that datasheet then please let me know. There have been subtle
changes between the different versions of the IP, so I may have to
double check with the vendor driver to see if things still apply to
older versions.

quoted

page 78 mentions: "By default the FID is zero and all entries belong
to shared VLAN learning."

Not talking about the hardware defaults when it's obvious the driver
changes those, in an attempt to comply to Linux networking expectations...

quoted

In any case, I recommend you to first set up a test bench where you
actually see a difference between packets being flooded to the CPU vs
matching an FDB entry targeting it. Then read up a bit what the provided
dsa_db argument wants from port_fdb_add(). This conversation with Alvin
should explain a few things.
https://patchwork.kernel.org/project/netdevbpf/cover/20220302191417.1288145-1-vladimir.oltean@nxp.com/#24763870

I previously asked Hauke whether the RX tag (net/dsa/tag_gswip.c) has
some bit to indicate whether traffic is flooded - but to his knowledge
the switch doesn't provide this information.

Yeah, you generally won't find quite that level of detail even in more
advanced switches. Not that you need it...

quoted

So I am not sure what I can do in this case - do you have any pointers for me?

Yes, I do.

gswip_setup has:

        /* Default unknown Broadcast/Multicast/Unicast port maps */
        gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP1);
        gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP2);
        gswip_switch_w(priv, BIT(cpu_port), GSWIP_PCE_PMAP3); <- replace BIT(cpu_port) with 0

If you can no longer ping, it means that flooding was how packets
reached the system.

I tried this but I can still ping OpenWrt's br-lan IP.

Yes, so I looked at the GSW140 documentation again and found:

Table 38 Special Tag Ingress Format
Bit 4 Force no learning (1 B = address is not learned, 0 B = ignore)

This header format is quite different compared to what is supported by
gswip_tag_xmit() - 8 bytes vs 4 - but if they're at all similar, the
GSWIP may have a similar "learn disable" bit per packet, which the
tag proto driver isn't setting => address learning takes place.

Please note that there is a reason why a "learn disable" bit exists.
To work properly from all angles, only traffic injected into the switch
from br-lan should be learned by the hardware. Traffic injected from
standalone ports shouldn't.

I zero'ed the GSWIP_PCE_PMAP2 register (which according to the
documentation is used for L2 multicast/broadcast flooding) as well,
which changes the behavior:
- once the br-lan (IP address: 192.168.1.14) interface is brought up
it cannot be ping'ed from a device connected to one of the switch
ports ("Destination Host Unreachable")
- I can ping a device connected to the switch from within OpenWrt
(meaning: ping from the CPU port to a device with IP 192.168.1.100 on
one of the switch port works)
- once I start the ping from within OpenWrt I immediately get replies
from OpenWrt to the other device

ping log:
    [similar messages omitted, only the icmp_seq is different]
    From 192.168.1.100 icmp_seq=87 Destination Host Unreachable
    [this is when I start "ping 192.168.1.100" from within OpenWrt)
    64 bytes from 192.168.1.14: icmp_seq=88 ttl=64 time=3016 ms
    64 bytes from 192.168.1.14: icmp_seq=89 ttl=64 time=2002 ms
    64 bytes from 192.168.1.14: icmp_seq=90 ttl=64 time=989 ms
    64 bytes from 192.168.1.14: icmp_seq=91 ttl=64 time=0.379 ms

I made sure that the changes from my patch are not applied:
# dmesg | grep " to fdb: -22" | wc -l
9

Also in case it's relevant: I added some printk's to
gswip_port_fdb_dump() (because I don't know how to differentiate
"hardware FDB" from "software FDB" entries in "bridge fdb show brport
lan1"):
The switch seems to learn the CPU port's MAC address automatically -
even before I issue "ping 192.168.1.100" (most likely due to something
in OpenWrt accessing the network).
The "static" flag is not set though (which is expected I think).

Ok, so this wasn't too complicated it seems. Thanks for doing the tests.

There isn't a better way than to printk the FDB entries on the CPU port,
given that there isn't a netdev for that port through which we could
report .ndo_fdb_dump.

What exists is the concept of "devlink regions" and "devlink port regions".
These are named binary bits exposed by certain DSA drivers that are
dumped by user space over netlink, and interpreted by a vendor specific
tool.

Andrew Lunn wrote mv88e6xxx_dump for... mv88e6xxx
https://github.com/lunn/mv88e6xxx_dump

and that can be used to dump information for CPU ports. It includes
entire Address Translation Unit (ATU) entries, so the format is a quite
a bit more detailed than bridge FDB entries since it is hardware specific.

There are various forks of that tool for other pieces of hardware, like
sja1105_dump:
https://github.com/vladimiroltean/mv88e6xxx_dump

To my knowledge there hasn't been any successful attempt in unifying all
forks into a larger dsa_dump, although I've wanted to do that.

Mentioning this just in case you have some spare time to work on a small
little debugging tool. If you're fine with printk that's cool too.

As a side-note: I think the comment is partially incorrect. At least
for the GSWIP IP revision which the driver is targeting,
GSWIP_PCE_PMAP1 is for the "monitoring" port. My understanding is that
this "monitoring port" is used with port mirroring (which the hardware
supports but we don't implement in the driver yet).

quoted

It appears that what goes on is interesting.
The switch is configured to flood traffic that's unknown to the FDB only
to the CPU (notably not to other bridged ports).
In software, the packet reaches tag_gswip.c, where unlike the majority
of other DSA tagging protocols, we do not call dsa_default_offload_fwd_mark(skb).
Then, the packet reaches the software bridge, and the switch has
informed the bridge (via skb->offload_fwd_mark == 0) that the packet
hasn't been already flooded in hardware, so the software bridge needs to
do it (only if necessary, of course).

The software bridge floods the packet according to its own FDB. In your
case, the software bridge recognizes the MAC DA of the packet as being
equal to the MAC address of br0 itself, and so, it doesn't flood it,
just terminates it locally. This is true whether or not the switch
learned that address in its FDB on the CPU port.

quoted

Also apologies if all of this is very obvious. So far I have only been
working on the xMII part of Ethernet drivers, meaning: I am totally
new to the FDB part.

quoted

Then have a patch (set) lifting the "return -EINVAL" from gswip *properly*.
And only then do we get to ask the questions "how bad are things for
linux-5.18.y? how bad are they for linux-5.15.y? what do we need to do?".

agreed


Thanks again for your time and all these valuable hints Vladimir!
Martin


[0] https://assets.maxlinear.com/web/documents/617930_gsw140_ds_rev1.11.pdf

So if I'm right, the state of facts is quite "not broken" (quite the
other way around, I'm really impressed), although there are still
improvements to be made. Flooding could be offloaded to hardware, then
flooding to CPU could be turned off and controlled via port promiscuity.
This would save quite a few CPU cycles.

Hearing that things are not horribly broken is great!
Also saving a few CPU cycles would be awesome since this SoCs has a
500MHz MIPS 34Kc core with two VPEs (meaning: one core which supports
SMT - or "HT" as known in the Intel world). So any CPU cycle that can
be saved helps

The first thing will be to get the driver to pass the existing
selftests. If you look at tools/testing/selftests/drivers/net/dsa/local_termination.sh,
that is what should ultimately pass, since it checks that the packets
that should be received are received, and the ones that shouldn't aren't.

But to get there we'll need to make smaller steps, like disable address
learning on standalone ports, isolate FDBs, maybe offload the bridge TX
forwarding process (in order to populate the "Force no learning" bit in
tag_gswip.c properly), and only then will the local_termination test
also pass. There's also some more work to do in the bridge driver, but
we're getting there slowly.

I'll give some more details about these things in the thread with the
selftest for the configure_vlan_while_not_filtering feature, there are
quite a few things to be said.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help