Thread (18 messages) 18 messages, 5 authors, 2017-12-04

Re: [RFC net-next 4/4] net: phy: Correctly process PHY_HALTED in phy_stop_machine()

From: Geert Uytterhoeven <geert@linux-m68k.org>
Date: 2017-11-06 15:50:13

Hi Florian,

On Tue, Oct 31, 2017 at 5:33 PM, Florian Fainelli [off-list ref] wrote:
On 10/31/2017 08:26 AM, Geert Uytterhoeven wrote:
quoted
On Mon, Oct 30, 2017 at 5:09 PM, Florian Fainelli [off-list ref] wrote:
quoted
On 10/30/2017 06:56 AM, Geert Uytterhoeven wrote:
quoted
On Thu, Oct 26, 2017 at 1:21 AM, Florian Fainelli [off-list ref] wrote:
quoted
Marc reported that he was not getting the PHY library adjust_link()
callback function to run when calling phy_stop() + phy_disconnect()
which does not indeed happen because we set the state machine to
PHY_HALTED but we don't get to run it to process this state past that
point.

Fix this with a synchronous call to phy_state_machine() in order to have
the state machine actually act on PHY_HALTED, set the PHY device's link
down, turn the network device's carrier off and finally call the
adjust_link() function.

At the end of phy_state_machine() though, if we are going to be moving
from PHY_HALTED to PHY_HALTED, do not reschedule the state machine, this
is pointless.

Reported-by: Marc Gonzalez <redacted>
Fixes: a390d1f379cf ("phylib: convert state_queue work to delayed_work")
Signed-off-by: Marc Gonzalez <redacted>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Thanks for your patch!

Unfortunately, after applying this one, the last in your series, both
sh73a0/kzm9g and r8a73a4/ape6evm start crashing again in the system
suspend/resume path, due to register accesses while the device is already
suspended:
OK, seems like there is another path, uncovered by this patch that we
can be hitting, does the following patch below help?
Unfortunately it doesn't help.
OK :/
quoted
quoted
quoted
Unhandled fault: imprecise external abort (0x1406) at 0x0005b950
Note that this is an imprecise external abort, i.e. it's reporting may
be delayed,
and the backtrace may be inaccurate.
True, can you help narrow it down with me? Can you confirm that
adjust_link() (assuming that is the problem) does not get called past
phy_stop_machine() as it should?
I've added some additional debug checks (keep track of both phy and
smsc state, and refuse the access registers if smsc is disabled).

Apparently phy_stop_machine() is called twice:
  - Once from mdio_bus_phy_suspend(), cfr. the first backtrace,
  - A second time from smsc911x_suspend(), cfr. the second backtrace.

The second call causes a call to smsc911x_phy_adjust_link() while the smsc is
already disabled, cfr. the third backtrace. This would trigger the imprecise
external abort if I let it access the registers.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:597
phy_stop_machine+0x44/0xcc
phy_stop_machine: phy running, good
CPU: 0 PID: 1083 Comm: bash Not tainted
4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
Hardware name: Generic R8A73A4 (Flattened Device Tree)
[<c0018f3c>] (unwind_backtrace) from [<c0014f40>] (show_stack+0x10/0x14)
[<c0014f40>] (show_stack) from [<c04bd928>] (dump_stack+0xa4/0xdc)
[<c04bd928>] (dump_stack) from [<c002ac28>] (__warn+0xcc/0xfc)
[<c002ac28>] (__warn) from [<c002ac8c>] (warn_slowpath_fmt+0x34/0x44)
[<c002ac8c>] (warn_slowpath_fmt) from [<c0314680>] (phy_stop_machine+0x44/0xcc)
[<c0314680>] (phy_stop_machine) from [<c03162ec>]
(mdio_bus_phy_suspend+0x24/0x40)
[<c03162ec>] (mdio_bus_phy_suspend) from [<c0307090>]
(dpm_run_callback+0x17c/0x3ec)
[<c0307090>] (dpm_run_callback) from [<c030a584>] (__device_suspend+0x498/0x6b0)
[<c030a584>] (__device_suspend) from [<c030a974>] (dpm_suspend+0x1d8/0x568)
[<c030a974>] (dpm_suspend) from [<c00880bc>]
(suspend_devices_and_enter+0x78/0xe98)
[<c00880bc>] (suspend_devices_and_enter) from [<c008991c>]
(pm_suspend+0xa40/0xbec)
[<c008991c>] (pm_suspend) from [<c0086e68>] (state_store+0xac/0xcc)
[<c0086e68>] (state_store) from [<c01c7780>] (kernfs_fop_write+0x190/0x1d0)
[<c01c7780>] (kernfs_fop_write) from [<c0153418>] (__vfs_write+0x20/0x11c)
[<c0153418>] (__vfs_write) from [<c0153688>] (vfs_write+0xb8/0x144)
[<c0153688>] (vfs_write) from [<c0153808>] (SyS_write+0x40/0x80)
[<c0153808>] (SyS_write) from [<c0010200>] (ret_fast_syscall+0x0/0x28)
---[ end trace 8fc4c71351438007 ]---
libphy: phy_stop_machine: Kicking state machine synchronously
libphy: phy_stop_machine: Kicking state machine done
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1083 at drivers/net/phy/phy.c:598
phy_stop_machine+0x64/0xcc
phy_stop_machine: phy already stopped
CPU: 0 PID: 1083 Comm: bash Tainted: G        W
4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
Hardware name: Generic R8A73A4 (Flattened Device Tree)
[<c0018f3c>] (unwind_backtrace) from [<c0014f40>] (show_stack+0x10/0x14)
[<c0014f40>] (show_stack) from [<c04bd928>] (dump_stack+0xa4/0xdc)
[<c04bd928>] (dump_stack) from [<c002ac28>] (__warn+0xcc/0xfc)
[<c002ac28>] (__warn) from [<c002ac8c>] (warn_slowpath_fmt+0x34/0x44)
[<c002ac8c>] (warn_slowpath_fmt) from [<c03146a0>] (phy_stop_machine+0x64/0xcc)
[<c03146a0>] (phy_stop_machine) from [<c031a448>] (smsc911x_suspend+0x44/0xa4)
[<c031a448>] (smsc911x_suspend) from [<c0307090>] (dpm_run_callback+0x17c/0x3ec)
[<c0307090>] (dpm_run_callback) from [<c030a584>] (__device_suspend+0x498/0x6b0)
[<c030a584>] (__device_suspend) from [<c030a974>] (dpm_suspend+0x1d8/0x568)
[<c030a974>] (dpm_suspend) from [<c00880bc>]
(suspend_devices_and_enter+0x78/0xe98)
[<c00880bc>] (suspend_devices_and_enter) from [<c008991c>]
(pm_suspend+0xa40/0xbec)
[<c008991c>] (pm_suspend) from [<c0086e68>] (state_store+0xac/0xcc)
[<c0086e68>] (state_store) from [<c01c7780>] (kernfs_fop_write+0x190/0x1d0)
[<c01c7780>] (kernfs_fop_write) from [<c0153418>] (__vfs_write+0x20/0x11c)
[<c0153418>] (__vfs_write) from [<c0153688>] (vfs_write+0xb8/0x144)
[<c0153688>] (vfs_write) from [<c0153808>] (SyS_write+0x40/0x80)
[<c0153808>] (SyS_write) from [<c0010200>] (ret_fast_syscall+0x0/0x28)
---[ end trace 8fc4c71351438008 ]---
libphy: phy_stop_machine: Kicking state machine synchronously
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1083 at drivers/net/ethernet/smsc/smsc911x.c:988
smsc911x_phy_adjust_link+0x2c/0x2e0
PHY stopped
CPU: 0 PID: 1083 Comm: bash Tainted: G        W
4.14.0-rc7-ape6evm-00443-gcdfc0e18a47e0bb3-dirty #637
Hardware name: Generic R8A73A4 (Flattened Device Tree)
[<c0018f3c>] (unwind_backtrace) from [<c0014f40>] (show_stack+0x10/0x14)
[<c0014f40>] (show_stack) from [<c04bd928>] (dump_stack+0xa4/0xdc)
[<c04bd928>] (dump_stack) from [<c002ac28>] (__warn+0xcc/0xfc)
[<c002ac28>] (__warn) from [<c002ac8c>] (warn_slowpath_fmt+0x34/0x44)
[<c002ac8c>] (warn_slowpath_fmt) from [<c031b1e8>]
(smsc911x_phy_adjust_link+0x2c/0x2e0)
[<c031b1e8>] (smsc911x_phy_adjust_link) from [<c031378c>]
(phy_link_down+0x18/0x24)
[<c031378c>] (phy_link_down) from [<c0314518>] (phy_state_machine+0x2d0/0x3f4)
[<c0314518>] (phy_state_machine) from [<c03146d8>] (phy_stop_machine+0x9c/0xcc)
[<c03146d8>] (phy_stop_machine) from [<c031a448>] (smsc911x_suspend+0x44/0xa4)
[<c031a448>] (smsc911x_suspend) from [<c0307090>] (dpm_run_callback+0x17c/0x3ec)
[<c0307090>] (dpm_run_callback) from [<c030a584>] (__device_suspend+0x498/0x6b0)
[<c030a584>] (__device_suspend) from [<c030a974>] (dpm_suspend+0x1d8/0x568)
[<c030a974>] (dpm_suspend) from [<c00880bc>]
(suspend_devices_and_enter+0x78/0xe98)
[<c00880bc>] (suspend_devices_and_enter) from [<c008991c>]
(pm_suspend+0xa40/0xbec)
[<c008991c>] (pm_suspend) from [<c0086e68>] (state_store+0xac/0xcc)
[<c0086e68>] (state_store) from [<c01c7780>] (kernfs_fop_write+0x190/0x1d0)
[<c01c7780>] (kernfs_fop_write) from [<c0153418>] (__vfs_write+0x20/0x11c)
[<c0153418>] (__vfs_write) from [<c0153688>] (vfs_write+0xb8/0x144)
[<c0153688>] (vfs_write) from [<c0153808>] (SyS_write+0x40/0x80)
[<c0153808>] (SyS_write) from [<c0010200>] (ret_fast_syscall+0x0/0x28)
---[ end trace 8fc4c71351438009 ]---
libphy: phy_stop_machine: Kicking state machine done

If I revert your "net: smsc911x: Properly manage PHY during suspend/resume",
phy_stop_machine() is no longer called twice, and system suspend works.

However, during resume, smsc911x_mii_read() is called before the
smsc is enabled, cfr. the fourth backtrace:

     WARNING: CPU: 1 PID: 17 at
drivers/net/ethernet/smsc/smsc911x.c:165 __smsc911x_reg_read+0x1c/0x88
    Modules linked in:
    CPU: 1 PID: 17 Comm: kworker/1:0 Tainted: G        W
4.14.0-rc7-kzm9g-00443-gcdfc0e18a47e0bb3-dirty #1013
    Hardware name: Generic SH73A0 (Flattened Device Tree)

smsc hangs of [fec10000.bus, which is started only here --->

    Workqueue: events_power_efficient phy_state_machine
    [<c010f304>] (unwind_backtrace) from [<c010b7ec>] (show_stack+0x10/0x14)
    [<c010b7ec>] (show_stack) from [<c0551ab0>] (dump_stack+0xa4/0xdc)
    [<c0551ab0>] (dump_stack) from [<c0120480>] (__warn+0xcc/0xfc)
    [<c0120480>] (__warn) from [<c0120554>] (warn_slowpath_null+0x1c/0x24)
    [<c0120554>] (warn_slowpath_null) from [<c03ce264>]
(__smsc911x_reg_read+0x1c/0x88)
    [<c03ce264>] (__smsc911x_reg_read) from [<c03cefc8>]
(smsc911x_mac_read+0x4c/0x118)
    [<c03cefc8>] (smsc911x_mac_read) from [<c03cf320>]
(smsc911x_mii_read+0x2c/0xa4)
    [<c03cf320>] (smsc911x_mii_read) from [<c03cd490>] (mdiobus_read+0x58/0x70)
    [<c03cd490>] (mdiobus_read) from [<c03cc4a0>] (genphy_update_link+0x18/0x50)
    [<c03cc4a0>] (genphy_update_link) from [<c03cc4e4>]
(genphy_read_status+0xc/0x1cc)
    [<c03cc4e4>] (genphy_read_status) from [<c03ca3e4>]
(phy_state_machine+0xa8/0x3f4)
    [<c03ca3e4>] (phy_state_machine) from [<c013a740>]
(process_one_work+0x240/0x3fc)
    [<c013a740>] (process_one_work) from [<c013af28>]
(worker_thread+0x2cc/0x40c)
    [<c013af28>] (worker_thread) from [<c014024c>] (kthread+0x124/0x144)
    [<c014024c>] (kthread) from [<c01071c8>] (ret_from_fork+0x14/0x2c)
    ---[ end trace 21b7024e273f9f21 ]---
    ------------[ cut here ]------------

(yes, the fourth backtrace is from another machine, but I can trigger
all of this
 on both r8a73a4/ape6evm and sh73a0/kzm9g anyway).

Gr{oetje,eeting}s,

                        Geert
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help