Re: [RFC] net: phy: read link status twice when phy_check_link_status()
From: liuyonglong <hidden>
Date: 2019-07-30 06:35:32
Also in:
lkml
:/sys/kernel/debug/tracing$ cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 45/45 #P:128
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
kworker/64:2-1028 [064] .... 172.295687: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x02 val:0x001c
kworker/64:2-1028 [064] .... 172.295726: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x03 val:0xc916
kworker/64:2-1028 [064] .... 172.296902: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x01 val:0x79ad
kworker/64:2-1028 [064] .... 172.296938: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x0f val:0x2000
kworker/64:2-1028 [064] .... 172.321213: mdio_access: mii-0000:bd:00.0 read phy:0x01 reg:0x00 val:0x1040
kworker/64:2-1028 [064] .... 172.343209: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x02 val:0x001c
kworker/64:2-1028 [064] .... 172.343245: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x03 val:0xc916
kworker/64:2-1028 [064] .... 172.343882: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/64:2-1028 [064] .... 172.343918: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0f val:0x2000
kworker/64:2-1028 [064] .... 172.362658: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
kworker/64:2-1028 [064] .... 172.385961: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x02 val:0x001c
kworker/64:2-1028 [064] .... 172.385996: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x03 val:0xc916
kworker/64:2-1028 [064] .... 172.386646: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x01 val:0x79ad
kworker/64:2-1028 [064] .... 172.386681: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x0f val:0x2000
kworker/64:2-1028 [064] .... 172.411286: mdio_access: mii-0000:bd:00.2 read phy:0x05 reg:0x00 val:0x1040
kworker/64:2-1028 [064] .... 172.433225: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x02 val:0x001c
kworker/64:2-1028 [064] .... 172.433260: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x03 val:0xc916
kworker/64:2-1028 [064] .... 172.433887: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x01 val:0x79ad
kworker/64:2-1028 [064] .... 172.433922: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x0f val:0x2000
kworker/64:2-1028 [064] .... 172.452862: mdio_access: mii-0000:bd:00.3 read phy:0x07 reg:0x00 val:0x1040
ifconfig-1324 [011] .... 177.325585: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
kworker/u257:0-8 [012] .... 177.325642: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x04 val:0x01e1
kworker/u257:0-8 [012] .... 177.325654: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x04 val:0x05e1
kworker/u257:0-8 [012] .... 177.325708: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/u257:0-8 [012] .... 177.325744: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200
kworker/u257:0-8 [012] .... 177.325779: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
kworker/u257:0-8 [012] .... 177.325788: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1240
kworker/u257:0-8 [012] .... 177.325843: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x798d
kworker/u257:0-8 [003] .... 178.360488: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x7989
kworker/u257:0-8 [000] .... 179.384479: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x7989
kworker/u257:0-8 [000] .... 180.408477: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x7989
kworker/u257:0-8 [000] .... 181.432474: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79a9
kworker/u257:0-8 [000] .... 181.432510: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0a val:0x7800
kworker/u257:0-8 [000] .... 181.432546: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200
kworker/u257:0-8 [000] .... 181.432582: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x05 val:0xc1e1
kworker/u257:0-8 [000] .... 182.456510: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/u257:0-8 [000] .... 182.456546: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x0a val:0x4800
kworker/u257:0-8 [000] .... 182.456582: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x09 val:0x0200
kworker/u257:0-8 [000] .... 182.456618: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x05 val:0xc1e1
kworker/u257:0-8 [001] .... 183.480476: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/u257:0-8 [000] .... 184.504478: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/u257:0-8 [000] .... 185.528486: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
kworker/u257:0-8 [000] .... 186.552475: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x01 val:0x79ad
ifconfig-1327 [011] .... 187.196036: mdio_access: mii-0000:bd:00.1 read phy:0x03 reg:0x00 val:0x1040
ifconfig-1327 [011] .... 187.196046: mdio_access: mii-0000:bd:00.1 write phy:0x03 reg:0x00 val:0x1840
On 2019/7/30 14:08, Heiner Kallweit wrote:On 30.07.2019 06:03, liuyonglong wrote:quoted
On 2019/7/30 4:57, Heiner Kallweit wrote:quoted
On 29.07.2019 05:59, liuyonglong wrote:quoted
On 2019/7/27 2:14, Heiner Kallweit wrote:quoted
On 26.07.2019 11:53, Yonglong Liu wrote:quoted
According to the datasheet of Marvell phy and Realtek phy, the copper link status should read twice, or it may get a fake link up status, and cause up->down->up at the first time when link up. This happens more oftem at Realtek phy.This is not correct, there is no fake link up status. Read the comment in genphy_update_link, only link-down events are latched. Means if the first read returns link up, then there is no need for a second read. And in polling mode we don't do a second read because we want to detect also short link drops. It would be helpful if you could describe your actual problem and whether you use polling or interrupt mode.[ 44.498633] hns3 0000:bd:00.1 eth5: net open [ 44.504273] hns3 0000:bd:00.1: reg=0x1, data=0x79ad -> called from phy_start_aneg [ 44.532348] hns3 0000:bd:00.1: reg=0x1, data=0x798d -> called from phy_state_machine,update link.This should not happen. The PHY indicates link up w/o having aneg finished.quoted
According to the datasheet: reg 1.5=0 now, means copper auto-negotiation not complete reg 1.2=1 now, means link is up We can see that, when we read the link up, the auto-negotiation is not complete yet, so the speed is invalid. I don't know why this happen, maybe this state is keep from bios? Or we may do something else in the phy initialize to fix it? And also confuse that why read twice can fix it?I suppose that basically any delay would do.quoted
[ 44.554063] hns3 0000:bd:00.1: invalid speed (-1) [ 44.560412] hns3 0000:bd:00.1 eth5: failed to adjust link. [ 45.194870] hns3 0000:bd:00.1 eth5: link up [ 45.574095] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989 [ 46.150051] hns3 0000:bd:00.1 eth5: link down [ 46.598074] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x7989 [ 47.622075] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79a9 [ 48.646077] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79ad [ 48.934050] hns3 0000:bd:00.1 eth5: link up [ 49.702140] hns3 0000:bd:00.1: phyid=3, reg=0x1, data=0x79adquoted
quoted
I add a fake status read, and can solve this problem. I also see that in genphy_update_link(), had delete the fake read in polling mode, so I don't know whether my solution is correct.Can you test whether the following fixes the issue for you? Also it would be interesting which exact PHY models you tested and whether you built the respective PHY drivers or whether you rely on the genphy driver. Best use the second patch to get the needed info. It may make sense anyway to add the call to phy_attached_info() to the hns3 driver.[ 40.100716] RTL8211F Gigabit Ethernet mii-0000:bd:00.3:07: attached PHY driver [RTL8211F Gigabit Ethernet] (mii_bus:phy_addr=mii-0000:bd:00.3:07, irq=POLL) [ 40.932133] hns3 0000:bd:00.3 eth7: net open [ 40.932458] hns3 0000:bd:00.3: invalid speed (-1) [ 40.932541] 8021q: adding VLAN 0 to HW filter on device eth7 [ 40.937149] hns3 0000:bd:00.3 eth7: failed to adjust link. [ 40.662050] Generic PHY mii-0000:bd:00.2:05: attached PHY driver [Generic PHY] (mii_bus:phy_addr=mii-0000:bd:00.2:05, irq=POLL) [ 41.563511] hns3 0000:bd:00.2 eth6: net open [ 41.563853] hns3 0000:bd:00.2: invalid speed (-1) [ 41.563943] 8021q: adding VLAN 0 to HW filter on device eth6 [ 41.568578] IPv6: ADDRCONF(NETDEV_CHANGE): eth6: link becomes ready [ 41.568898] hns3 0000:bd:00.2 eth6: failed to adjust link. I am using RTL8211F, but you can see that, both genphy driver and RTL8211F driver have the same issue.quoted
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index 6b5cb87f3..fbecfe210 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c@@ -1807,7 +1807,8 @@ int genphy_read_status(struct phy_device *phydev) linkmode_zero(phydev->lp_advertising); - if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) { + if (phydev->autoneg == AUTONEG_ENABLE && + (phydev->autoneg_complete || phydev->link)) { if (phydev->is_gigabit_capable) { lpagb = phy_read(phydev, MII_STAT1000); if (lpagb < 0)I have try this patch, have no effect. I suppose that at this time, the autoneg actually not complete yet. Maybe the wrong phy state is passed from bios? Invalid speed just happen at the first time when ethX up, after that, repeat the ifconfig down/ifconfig up command can not see that again. So I think the bios should power off the phy(writing reg 1.11 to 1) before it starts the OS? Or any other way to fix this in the OS?To get a better idea of whats going on it would be good to see a full MDIO trace. Can you enable MDIO tracing via the following sysctl file /sys/kernel/debug/tracing/events/mdio/enable and provide the generated trace? Due to polling mode each second entries will be generated, so you better stop network after the issue occurred.quoted
Heiner .