Thread (26 messages) 26 messages, 4 authors, 2022-03-31

Re: ordering of call to unbind() in usbnet_disconnect

From: Andrew Lunn <andrew@lunn.ch>
Date: 2022-03-26 12:44:25

On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
On Tue, Mar 15, 2022 at 12:38:41PM +0100, Oleksij Rempel wrote:
quoted
On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
quoted
quoted
quoted
quoted
quoted
On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
quoted
I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
unregister_netdev()") is causing regressions.
Is it illegal to disconnect a PHY from an unregistered, but not yet freed
net_device?

Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
PHY "fails" in that situation.  Please elaborate what the failure looked
like.  Did you get a stacktrace?
Oleksij, I cannot reproduce your stacktrace (included in full length below).
I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
with all your (and other people's) asix patches applied on top,
except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
never got a stacktrace.

I've also walked down the code paths from usbnet_disconnect() and cannot
see how the stacktrace could occur.

Normally an unregistering netdev is removed from the linkwatch event list
(lweventlist) via this call stack:

          usbnet_disconnect()
            unregister_netdev()
              rtnl_unlock()
                netdev_run_todo()
                  netdev_wait_allrefs()
                    linkwatch_forget_dev()
                      linkwatch_do_dev()

For the stacktrace to occur, the netdev would have to be subsequently
re-added to the linkwatch event list via linkwatch_fire_event().
Hi Lukas

What you might be missing is a call to phy_error()
 
That is called, among other places, from netif_carrier_off().  However,
netif_carrier_off() is already called *before* linkwatch_forget_dev()
when unregister_netdev() stops the netdev before unregistering it:

          usbnet_disconnect()
            unregister_netdev()
              unregister_netdevice()
                unregister_netdevice_queue(dev, NULL)
                  unregister_netdevice_many()
                    dev_close_many()
                      __dev_close_many()
                        usbnet_stop()
                          ax88772_stop()
                            phy_stop() # state = PHY_HALTED
                              phy_state_machine()
I'm guessing somewhere around here:

If it calls into the PHY driver, and the PHY calls for an MDIO bus
transaction, and that returns an error, -ENODEV or -EIO for example,
because the USB device has gone away, and that results in a call to
phy_error().

void phy_error(struct phy_device *phydev)
{
        WARN_ON(1);

        mutex_lock(&phydev->lock);
        phydev->state = PHY_HALTED;
        mutex_unlock(&phydev->lock);

        phy_trigger_machine(phydev);
}

That will trigger the PHY state machine to run again, asynchronously.

The end of phy_stop() says:

        /* Cannot call flush_scheduled_work() here as desired because
         * of rtnl_lock(), but PHY_HALTED shall guarantee irq handler
         * will not reenable interrupts.
         */

so it looks like the state machine will run again, and potentially use
netdev.

If the MDIO bus driver is no longer returning ENODEV, than we should
avoid this.

      Andrew
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help