Re: [PATCH net-next 5/8] net: lan966x: Add lag support for lan966x.
From: Vladimir Oltean <olteanv@gmail.com>
Date: 2022-06-27 09:40:42
Also in:
lkml
On Mon, Jun 27, 2022 at 08:46:12AM +0200, Horatiu Vultur wrote:
quoted
This incorrect logic seems to have been copied from ocelot from before commit a14e6b69f393 ("net: mscc: ocelot: fix incorrect balancing with down LAG ports"). The issue is that you calculate bond_mask with only_active_ports=true. This means the for_each_set_bit() will not iterate through the inactive LAG ports, and won't set the bond_mask as the PGID destination for those ports. That isn't what is desired; as explained in that commit, inactive LAG ports should be removed via the aggregation PGIDs and not via the destination PGIDs. Otherwise, an FDB entry targeted towards the LAG (effectively towards the "primary" LAG port, whose logical port ID gives the LAG ID) will not egress even the "secondary" LAG port if the primary's link is down.Thanks for looking at this. That is correct, ocelot was the source of inspiration. The issue that you described in the mentioned commit is fixed in the last patch of this series. I will have a look at your commit and will try to integrated it. Thanks.
I figured that would be the case, although I didn't really understand the explanation from patch 8/8 (arguably, there it is said that the switch tries to send on the down port, not that it won't send on the up port, which is more relevant information). But in any case, it would be good to introduce code that works from the beginning, rather than fix it up in a follow-up patch. I believe that the commit I referenced is a simplification either way, since it removes the "only_active_ports" argument from the bond mask function.