Thread (26 messages) 26 messages, 4 authors, 2022-03-31

Re: ordering of call to unbind() in usbnet_disconnect

From: Lukas Wunner <lukas@wunner.de>
Date: 2022-03-26 13:01:53

On Sat, Mar 26, 2022 at 01:44:12PM +0100, Andrew Lunn wrote:
On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
quoted
Oleksij, I cannot reproduce your stacktrace (included in full length below).
I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
with all your (and other people's) asix patches applied on top,
except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
never got a stacktrace.

I've also walked down the code paths from usbnet_disconnect() and cannot
see how the stacktrace could occur.

Normally an unregistering netdev is removed from the linkwatch event list
(lweventlist) via this call stack:

          usbnet_disconnect()
            unregister_netdev()
              rtnl_unlock()
                netdev_run_todo()
                  netdev_wait_allrefs()
                    linkwatch_forget_dev()
                      linkwatch_do_dev()

For the stacktrace to occur, the netdev would have to be subsequently
re-added to the linkwatch event list via linkwatch_fire_event().
What you might be missing is a call to phy_error()
But phy_error() has a WARN_ON(1) right at its top.  So it produces
a stacktrace itself.  That stacktrace is nowhere to be seen in the
dmesg output Oleksij posted.  Hence it can't be caused by phy_error().

Also, recall that unregister_netdev() stops the netdev before
unregistering it.  That in turn causes an invocation of phy_stop()
via ax88772_stop().  phy_stop() already puts the PHY into PHY_HALTED
state and resets phydev->link = 0.  So a subsequent phy_error() cannot
result in a call to phy_link_down() (which would indeed trigger a
dangerous linkwatch_fire_event()).

quoted
That is called, among other places, from netif_carrier_off().  However,
netif_carrier_off() is already called *before* linkwatch_forget_dev()
when unregister_netdev() stops the netdev before unregistering it:

          usbnet_disconnect()
            unregister_netdev()
              unregister_netdevice()
                unregister_netdevice_queue(dev, NULL)
                  unregister_netdevice_many()
                    dev_close_many()
                      __dev_close_many()
                        usbnet_stop()
                          ax88772_stop()
                            phy_stop() # state = PHY_HALTED
                              phy_state_machine()
I'm guessing somewhere around here:

If it calls into the PHY driver, and the PHY calls for an MDIO bus
transaction, and that returns an error, -ENODEV or -EIO for example,
because the USB device has gone away, and that results in a call to
phy_error().
Oleksij amended phy_state_machine() to bail out if err == -ENODEV
with commit 06edf1a940be ("net: phy: do not print dump stack if device
was removed").  The commit skips the phy_error() on -ENODEV, which
makes a lot of sense.

Thanks,

Lukas
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help