Re: [drivers/net/phy/sfp] intermittent failure in state machine checks
From: ѽ҉ᶬḳ℠ <hidden>
Date: 2020-01-09 23:50:19
On 09/01/2020 23:10, Russell King - ARM Linux admin wrote:
Please don't use mii-tool with SFPs that do not have a PHY; the "PHY" registers are emulated, and are there just for compatibility. Please use ethtool in preference, especially for SFPs.
Sure, just ethtool is not much of help for this particular matter, all there is ethtool -m and according to you the EEPROM dump is not to be relied on.
CONFIG_DEBUG_GPIO is not the same as having debugfs support enabled. If debugfs is enabled, then gpiolib will provide the current state of gpios through debugfs. debugfs is normally mounted on /sys/kernel/debug, but may not be mounted by default depending on policy. Looking in /proc/filesystems will tell you definitively whether debugfs is enabled or not in the kernel.
debugsfs is mounted but ls -af /sys/kernel/debug/gpio only producing (oddly): /sys/kernel/debug/gpio
So, if that is correct... Current OpenWRT is derived from 4.19-stable kernels, which include experimental patches picked at some point from my "phy" branch, and TOS is derived from OpenWRT.
This may not be correct since there are not many device targets in OpenWrt that feature a SFP cage (least as of today), the Turris Omnia might even be the sole one. I did not check whether that the code was/is available in OpenWrt, and likely it is not, but it was in an earlier TOS version since their platforms apparently feature a SFP cage.
That makes it very difficult for anyone in the mainline kernel community to do anything about this; sending you a patch is likely useless since you're not going to be able to test it.
I understand, I just reached out all the way upstream since other available avenues, and started all the way downstream, did not produce anything tangible or even a response. I am grateful that finally at least you obliged and shed some light on the matter. Maybe I should just try finding a module that is declared SPF MSA conform...
You think the state machines are doing something clever. They don't. They are all very simple and quite dumb.
Not really, I assume it just does what it is supposed to do in line with current (industry) standards and best practices.
The only real way to get to the bottom of it is to manually enable debug in sfp.c so its possible to watch what happens, not only with the hardware signals but also what the state machines are doing. However, I'm very certain that there is no problem with the state machines, and it is that the Allnet module is raising TX_FAULT.
I am sure it does and I am pursuing Allnet for a response, albeit not looking promising at the moment. Once there is however I shall pick up the thread again.
I also think from what you've said above that rebuilding a kernel to enable debug in sfp.c is going to not be possible for you.
No, I might be able to get this done for amd64 but with this ARM SoC there is all kind of other stuff (SPI, MTD, I2C, u-boot and whatnot) involved and I am afraid it will go sideways if I attempt compiling.