Re: [PATCH net-next] net: phylink: guard link replay helpers against NULL phylink instance
From: "Russell King (Oracle)" <linux@armlinux.org.uk>
Date: 2026-02-17 15:48:31
Also in:
lkml
On Tue, Feb 17, 2026 at 02:51:25PM +0100, Andrew Lunn wrote:
On Tue, Feb 17, 2026 at 09:22:25AM +0100, Paolo Abeni wrote:quoted
On 2/5/26 8:23 PM, Vladimir Oltean wrote:quoted
There is a crash when unbinding the sja1105 driver under special circumstances: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 Call trace: phylink_run_resolve_and_disable+0x10/0x90 sja1105_static_config_reload+0xc0/0x410 sja1105_vlan_filtering+0x100/0x140 dsa_port_vlan_filtering+0x13c/0x368 dsa_port_reset_vlan_filtering.isra.0+0xe8/0x198 dsa_port_bridge_leave+0x130/0x248 dsa_user_changeupper.part.0+0x74/0x158 dsa_user_netdevice_event+0x50c/0xa50 notifier_call_chain+0x78/0x148 raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x58/0xa8 __netdev_upper_dev_unlink+0xac/0x220 netdev_upper_dev_unlink+0x38/0x70 del_nbp+0x1a4/0x320 br_del_if+0x3c/0xd8 br_device_event+0xf8/0x2d8 notifier_call_chain+0x78/0x148 raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x58/0xa8 unregister_netdevice_many_notify+0x314/0x848 unregister_netdevice_queue+0xe8/0xf8 dsa_user_destroy+0x50/0xa8 dsa_port_teardown+0x80/0x98 dsa_switch_teardown_ports+0x4c/0xb8 dsa_switch_deinit+0x94/0xb8 dsa_switch_put_tree+0x2c/0xc0 dsa_unregister_switch+0x38/0x60 sja1105_remove+0x24/0x40 spi_remove+0x38/0x60 device_remove+0x54/0x90 device_release_driver_internal+0x1d4/0x230 device_driver_detach+0x20/0x38 unbind_store+0xbc/0xc8 ---[ end trace 0000000000000000 ]--- which requires an explanation. When a port offloads a bridge, the switch must be reset to change the VLAN awareness state (the SJA1105_VLAN_FILTERING reason for sja1105_static_config_reload()). When the port leaves a VLAN-aware bridge, it must also be reset for the same reason: it is returning to operation as a VLAN-unaware standalone port. sja1105_static_config_reload() triggers the phylink link replay helpers. Because sja1105 is a switch, it has multiple user ports. During unbind, ports are torn down one by one in dsa_switch_teardown_ports() -> dsa_port_teardown() -> dsa_user_destroy(). The crash happens when the numerically first user port is not part of the VLAN-aware bridge, but any other user port is. Tearing down the first user port causes phylink_destroy() to be called on dp->pl, and this pointer to be set to NULL. Then, when the second user port is torn down, this was offloading a VLAN-aware bridge port, so indirectly it will trigger sja1105_static_config_reload(). The latter function iterates using dsa_switch_for_each_available_port(), and unconditionally dereferences dp->pl, including for the aforementioned torn down previous port, and passes that to phylink. This is where the NULL pointer is coming from. There are multiple levels at which this could be avoided: - add an "if (dp->pl)" in sja1105_static_config_reload() - make the phylink replay helpers NULL-tolerant - mark ports as DSA_PORT_TYPE_UNUSED after dsa_port_phylink_destroy() has run, such that subsequent dsa_switch_for_each_available_port() iterations skip them - disconnect the entire switch at once from switchdev and NETDEV_CHANGEUPPER events while unbinding, not just port by port, likely using a "ds->unbinding = true" mechanism or similar however options 3 and 4 are quite heavy and might have side effects, option 1 is very unassuming and option 2 seems a more elegant variant of 1, given the fact that sja1105 is the only user of these phylink replay helpers. It allows to keep the driver simple and is the option I went with. Functionally speaking, transforming the replay helpers into no-ops for ports without a phylink instance is fine, because that only happens during driver removal (an operation which cannot be cancelled). The ports are not required to work. Fixes: 0b2edc531e0b ("net: dsa: sja1105: let phylink help with the replay of link callbacks") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>I think this patch could land on current net, but it would be nice an ack from phylib SMEs.Sorry, weekend away. I prefer option 1. I _think_ option 2 only works because the MAC driver set dp->pl to NULL. phylink is not responsible for the NULL, so it seems odd for phylink to assume there is a NULL. Only the MAC driver knows if the MAC driver has set dp->pl to NULL.
I also prefer option 1. Currently, none of phylink's driver API permits struct phylink to be NULL, and I'd prefer that to remain the case for consistency. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!