Re: [PATCH net v2] net: bridge: move vlan init/deinit to NETDEV_REGISTER/UNREGISTER
From: Nikolay Aleksandrov <hidden>
Date: 2019-07-31 22:40:10
Also in:
bridge
On 01/08/2019 01:37, Nikolay Aleksandrov wrote:
Most of the bridge device's vlan init bugs come from the fact that it's done in the wrong place, way too early in ndo_init() before the device is even assigned an ifindex. That makes error handling harder, especially for older kernels which don't have bridge ndo_uninit callback. It also introduces another bug when the bridge's dev_addr is added as fdb in the the initial default pvid on vlan initialization, the fdb notification has ifindex/NDA_MASTER both equal to 0 (see example below) which really makes no sense for user-space[0]. Usually user-space software would ignore such entries, but they are actually valid and will eventually have all necessary attributes. I chose to change the order because this can be backported to all kernels even pre-ndo_uninit ones without many changes and it keeps init/deinit symmetric. As a bonus this allows us to keep the vlan init/deinit entirely in br_vlan.c and remove those exports. It makes much more sense to send a notification *after* the device has registered and has a proper ifindex allocated rather than before when there's a chance that the registration might still fail. For the demonstration below a small change to iproute2 for printing all fdb notifications is added, because it contained a workaround not to show entries with ifindex == 0. Command executed while monitoring: $ ip l add br0 type bridge Before (both ifindex and master == 0): $ bridge monitor fdb 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent After (proper br0 ifindex): $ bridge monitor fdb e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent v2: on error in br_vlan_init set br->vlgrp to NULL [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 Reported-by: michael-dev <redacted> Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") Signed-off-by: Nikolay Aleksandrov <redacted> --- I tried a few different approaches to resolve this but they were all unsuitable for some kernels, this approach can go to stables easily and IMO is the way this had to be done from the start. Alternatively we could move only the br_vlan_add and pair it with a br_vlan_del of default_pvid on the same events, but I don't think it hurts to move the whole init/deinit there as it'd help older stable releases as well. I also tested the br_vlan_init error handling after the move by always returning errors from all over it. Since errors at NETDEV_REGISTER cause NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless why it happened (e.g. device destruction or init error). net/bridge/br.c | 5 ++++- net/bridge/br_device.c | 10 ---------- net/bridge/br_private.h | 19 ++++--------------- net/bridge/br_vlan.c | 25 ++++++++++++++++++------- 4 files changed, 26 insertions(+), 33 deletions(-)
Aargh, I apologize for the noise, this is the wrong v2 patch... Will send the correct one as v3 in a moment.