Thread (14 messages) 14 messages, 3 authors, 2022-05-05

Re: FEC MDIO read timeout on linkup

From: Francesco Dolcini <hidden>
Date: 2022-05-05 17:54:11

On Thu, May 05, 2022 at 07:41:00PM +0200, Andrew Lunn wrote:
On Thu, May 05, 2022 at 10:29:01AM +0200, Francesco Dolcini wrote:
quoted
Hello Andrew and all, I believe I finally found the problem and I'm
preparing a patch for it.

On Wed, May 04, 2022 at 12:17:59AM +0200, Andrew Lunn wrote:
quoted
quoted
I'm wondering could this be related to
fec_enet_adjust_link()->fec_restart() during a fec_enet_mdio_read()
and one of the many register write in fec_restart() just creates the
issue, maybe while resetting the FEC? Does this makes any sense?
phylib is 'single threaded', in that only one thing will be active at
once for a PHY. While fec_enet_adjust_link() is being called, there
will not be any read/writes occurring for that PHY.
I think this is not the whole story here. We can have a phy interrupt
handler that runs in its own context and it could be doing a MDIO
transaction, and this is exactly my case.

Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
                           |
                           | phy_interrupt()            <-- PHY IRQ
	                   |  handle_interrupt()
	                   |   phy_read()
	                   |   phy_trigger_machine()
	                   |    --> schedule WQ
                           |
	                   |
phy_state_machine()        |                        
 phy_check_link_status()   |
  phy_link_change()        |
   phydev->adjust_link()   |
    fec_enet_adjust_link() | 
     --> FEC reset         | phy_interrupt()            <-- PHY IRQ
	                   |  phy_read()
	 	           |

To confirm this I have added a spinlock to detect this race condition
with just a trylock and a WARN_ON(1) when the locking is failing. On
"MDIO read timeout" acquiring the spinlock fails.

This is also in agreement with the fact that polling the PHY instead of
having the interrupt is working just fine.
Yes, that makes sense.

But i would fix this differently. The interrupt handler runs in a
threaded interrupt. So it can use mutex. So it should actually take
the phy mutex.
I was just about to send a patch with phy_lock_mdio_bus() in
fec_enet_adjust_link(), anyway, I'll send the version you proposed in a
bit.

Thanks,
Francesco
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help