Re: [PATCH 1/1] net: bridge: sync fdb to new unicast-filtering ports
From: Wolfgang Bumiller <hidden>
Date: 2021-07-02 07:34:52
Also in:
bridge
On Thu, Jul 01, 2021 at 06:33:20PM +0300, Nikolay Aleksandrov wrote:
On 01/07/2021 17:51, Thomas Lamprecht wrote:quoted
On 01.07.21 15:49, Nikolay Aleksandrov wrote:quoted
On 01/07/2021 15:28, Wolfgang Bumiller wrote:quoted
Since commit 2796d0c648c9 ("bridge: Automatically manage port promiscuous mode.") bridges with `vlan_filtering 1` and only 1 auto-port don't set IFF_PROMISC for unicast-filtering-capable ports. Normally on port changes `br_manage_promisc` is called to update the promisc flags and unicast filters if necessary, but it cannot distinguish between *new* ports and ones losing their promisc flag, and new ports end up not receiving the MAC address list. Fix this by calling `br_fdb_sync_static` in `br_add_if` after the port promisc flags are updated and the unicast filter was supposed to have been filled. Signed-off-by: Wolfgang Bumiller <redacted> --- net/bridge/br_if.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c index f7d2f472ae24..183e72e7b65e 100644 --- a/net/bridge/br_if.c +++ b/net/bridge/br_if.c@@ -652,6 +652,18 @@ int br_add_if(struct net_bridge *br, struct net_device *dev, list_add_rcu(&p->list, &br->port_list); nbp_update_port_count(br); + if (!br_promisc_port(p) && (p->dev->priv_flags & IFF_UNICAST_FLT)) { + /* When updating the port count we also update all ports' + * promiscuous mode. + * A port leaving promiscuous mode normally gets the bridge's + * fdb synced to the unicast filter (if supported), however, + * `br_port_clear_promisc` does not distinguish between + * non-promiscuous ports and *new* ports, so we need to + * sync explicitly here. + */ + if (br_fdb_sync_static(br, p)) + netdev_err(dev, "failed to sync bridge addresses to this port\n"); + } netdev_update_features(br->dev);Hi,Hi, commenting as was peripherally involved into this.quoted
The patch is wrong because br_add_if() can fail after you sync these entries and then nothing will unsync them. Out of curiousity what's the use case of a bridge with a single port only ? Because, as you've also noted, this will be an issue only if there is a single port and sounds like a corner case, maybe there's a better way to handle it.In practice you're right, it is not often useful, but that does not means that it won't happen. For example, in Proxmox VE, a hypervisor/clustering debian-based distro, we recommend users that they need to migrate all (QEMU) VMs to another cluster-node when doing a (major) upgrade as with that way they get no downtime for the VMs. Now, if the user had a bridge with a single port this was not an issue as long as VMs where running the TAP device we use for them where bridge ports too. But on reboot, with all VMs and thus ports still gone, the system comes up with that bridge having a single port. That itself was seen as a problem until recently because the system set the MAC of the bridge to one of the bridge ports. But with the next Debian Version (Bullseye) we're pulling in a systemd version which now defaults to MACAddressPolicy=persistent[0] also for virtual devices like bridges, so with that update done and rebooted the bridge has another MAC address, not matching the one of a bridge port anymore, which means the host may, depending on some other side effects like vlan-awareness on (as without that promisc would be enabled anyway), not be ping'able and other issue anymore. Due to some specialty handling of learning/filtering in specific drivers this is not reproducible on every NIC model (IIRC, it was in igb and e1000e ones but not in some realtek ones). Hope that was not written to confusingly. [0]: https://www.freedesktop.org/software/systemd/man/systemd.link.html#MACAddressPolicy=I see, thank you for the details. Just to clarify I'm not against fixing it or against this patch, the question was out of curiousity only, as for the patch it needs to be fixed so unsync will be handled in the error paths after the sync and also I'd suggest changing the error message to contain
Ah sorry, somehow I thought there was already an unsync reachable in that code path, but I was wrong. Looks like I can just add the unsync before the list_del in err7 since list_add happens pretty much right before the sync. I'll test with a knob to force a failure, I still have my patched qemu to observe what happens to the mac list on the NIC :-)
what exactly couldn't be synced: "failed to sync bridge static fdb addresses to this port"
Yeah that sounds better! Will change it in v2.
or something in those lines. Since this fixes actual bug please also add a Fixes: tag with the appropriate commit id where it was introduced.
I was a bit hesitant at first about adding this, since I hadn't done any
before/after testing with the particular commit introducing the change,
though I'm fairly confident about that by now (maybe more so since the
`auto_cnt` condition was wrong (fixed up in e0a47d1f7816 ("bridge: Fix
incorrect judgment of promisc"))).