Re: [RFC] net: phy: read link status twice when phy_check_link_status()
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: 2019-07-31 05:53:54
Also in:
lkml
On 31.07.2019 05:33, liuyonglong wrote:
On 2019/7/31 3:04, Heiner Kallweit wrote:quoted
On 30.07.2019 08:35, liuyonglong wrote:quoted
:/sys/kernel/debug/tracing$ cat trace # tracer: nop # # entries-in-buffer/entries-written: 45/45 #P:128 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | kworker/64:2-1028 [064] .... 172.295687: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x02 val:0x001c kworker/64:2-1028 [064] .... 172.295726: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x03 val:0xc916 kworker/64:2-1028 [064] .... 172.296902: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad kworker/64:2-1028 [064] .... 172.296938: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x0f val:0x2000 kworker/64:2-1028 [064] .... 172.321213: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x00 val:0x1040 kworker/64:2-1028 [064] .... 172.343209: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x02 val:0x001c kworker/64:2-1028 [064] .... 172.343245: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x03 val:0xc916 kworker/64:2-1028 [064] .... 172.343882: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad kworker/64:2-1028 [064] .... 172.343918: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0f val:0x2000 kworker/64:2-1028 [064] .... 172.362658: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 kworker/64:2-1028 [064] .... 172.385961: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x02 val:0x001c kworker/64:2-1028 [064] .... 172.385996: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x03 val:0xc916 kworker/64:2-1028 [064] .... 172.386646: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad kworker/64:2-1028 [064] .... 172.386681: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x0f val:0x2000 kworker/64:2-1028 [064] .... 172.411286: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x00 val:0x1040 kworker/64:2-1028 [064] .... 172.433225: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x02 val:0x001c kworker/64:2-1028 [064] .... 172.433260: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x03 val:0xc916 kworker/64:2-1028 [064] .... 172.433887: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad kworker/64:2-1028 [064] .... 172.433922: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x0f val:0x2000 kworker/64:2-1028 [064] .... 172.452862: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x00 val:0x1040 ifconfig-1324 [011] .... 177.325585: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 kworker/u257:0-8 [012] .... 177.325642: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1 kworker/u257:0-8 [012] .... 177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1 kworker/u257:0-8 [012] .... 177.325708: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad kworker/u257:0-8 [012] .... 177.325744: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200 kworker/u257:0-8 [012] .... 177.325779: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040 kworker/u257:0-8 [012] .... 177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240 kworker/u257:0-8 [012] .... 177.325843: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x798dWhat I think that happens here: Writing 0x1240 to BMCR starts aneg. When reading BMSR immediately after that then the PHY seems to have cleared the "aneg complete" bit already, but not yet the "link up" bit. This results in the false "link up" notification. The following patch is based on the fact that in case of enabled aneg we can't have a valid link if aneg isn't finished. Could you please test whether this works for you?diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 6b5cb87f3..7ddd91df9 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c@@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev) phydev->link = status & BMSR_LSTATUS ? 1 : 0; phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0; + /* Consider the case that autoneg was started and "aneg complete" + * bit has been reset, but "link up" bit not yet. + */ + if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete) + phydev->link = 0; + return 0; } EXPORT_SYMBOL(genphy_update_link);This patch can solve the issue! Will it be upstream?
I'll check for side effects, but in general: yes.
So it's nothing to do with the bios, and just the PHY's own behavior, the "link up" bit can not reset immediately,right?
Yes, it's the PHY's own behavior, and to a certain extent it may depend on speed of the MDIO bus. At least few network chips require a delay of several microseconds after each MDIO bus access. This may be sufficient for the PHY to reset the link-up bit in time.
ps: It will take 1 second more to make the link up for RTL8211F when 0x798d happend.
In polling mode link-up is detected up to 1s after it happened. You could switch to interrupt mode to reduce the aneg time.
Heiner